Özet
Recently, neural architectures play a significant role in the task of Word Sense Disambiguation (WSD). Supervised methods seem to be ahead of its rivals and their performance mostly depends on the size of training data. A numerous number of human-annotated data available for WSD task have been constructed for English. However, low-resource languages (LRLs) still face difficulty in finding suitable data resources. Gathering and annotating a sufficient amount of training data is a time-consuming and labor-expensive work. To address and overcome this problem, in this paper we investigate the possibility of using a semi-supervised context based WSD approach for data augmentation (in order to be later used for supervised learning). Since, it is even difficult to find WSD evaluation datasets for LRLs, in this study, we use English datasets to build a proof-of-concept and to evaluate their applicability onto LRLs. Our semi-supervised approach uses a seed set and context embeddings. We test with 9 different context based language models (including ELMo, BERT, RoBERTa etc.) and investigate their impacts on WSD. We increased our baseline results up to 28 percentage point improvements (baseline with ELMo 50.39% and ELMo Sense Seed Based Average Similarity Model 78.06%) in terms of accuracy. Our initial findings reveal that the proposed approach is very promising for the augmentation of WSD datasets of LRLs.
Orijinal dil | İngilizce |
---|---|
Ana bilgisayar yayını başlığı | 5th International Conference on Computer Science and Engineering, UBMK 2020 |
Yayınlayan | Institute of Electrical and Electronics Engineers Inc. |
Sayfalar | 337-342 |
Sayfa sayısı | 6 |
ISBN (Elektronik) | 9781728175652 |
DOI'lar | |
Yayın durumu | Yayınlandı - Eyl 2020 |
Etkinlik | 5th International Conference on Computer Science and Engineering, UBMK 2020 - Diyarbakir, Turkey Süre: 9 Eyl 2020 → 10 Eyl 2020 |
Yayın serisi
Adı | 5th International Conference on Computer Science and Engineering, UBMK 2020 |
---|
???event.eventtypes.event.conference???
???event.eventtypes.event.conference??? | 5th International Conference on Computer Science and Engineering, UBMK 2020 |
---|---|
Ülke/Bölge | Turkey |
Şehir | Diyarbakir |
Periyot | 9/09/20 → 10/09/20 |
Bibliyografik not
Publisher Copyright:© 2020 IEEE.
Finansman
This work is part of a research project supported by ITU Scientifi Research Projects Grant no: MDK-2017-40968. And Arda Inceoglu was supported by the Turkcell-Istanbul Technical University Researcher Funding Program.
Finansörler | Finansör numarası |
---|---|
Turkcell-Istanbul Technical University | |
International Technological University | MDK-2017-40968 |