Abstract
Recently, neural architectures play a significant role in the task of Word Sense Disambiguation (WSD). Supervised methods seem to be ahead of its rivals and their performance mostly depends on the size of training data. A numerous number of human-annotated data available for WSD task have been constructed for English. However, low-resource languages (LRLs) still face difficulty in finding suitable data resources. Gathering and annotating a sufficient amount of training data is a time-consuming and labor-expensive work. To address and overcome this problem, in this paper we investigate the possibility of using a semi-supervised context based WSD approach for data augmentation (in order to be later used for supervised learning). Since, it is even difficult to find WSD evaluation datasets for LRLs, in this study, we use English datasets to build a proof-of-concept and to evaluate their applicability onto LRLs. Our semi-supervised approach uses a seed set and context embeddings. We test with 9 different context based language models (including ELMo, BERT, RoBERTa etc.) and investigate their impacts on WSD. We increased our baseline results up to 28 percentage point improvements (baseline with ELMo 50.39% and ELMo Sense Seed Based Average Similarity Model 78.06%) in terms of accuracy. Our initial findings reveal that the proposed approach is very promising for the augmentation of WSD datasets of LRLs.
Original language | English |
---|---|
Title of host publication | 5th International Conference on Computer Science and Engineering, UBMK 2020 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 337-342 |
Number of pages | 6 |
ISBN (Electronic) | 9781728175652 |
DOIs | |
Publication status | Published - Sept 2020 |
Event | 5th International Conference on Computer Science and Engineering, UBMK 2020 - Diyarbakir, Turkey Duration: 9 Sept 2020 → 10 Sept 2020 |
Publication series
Name | 5th International Conference on Computer Science and Engineering, UBMK 2020 |
---|
Conference
Conference | 5th International Conference on Computer Science and Engineering, UBMK 2020 |
---|---|
Country/Territory | Turkey |
City | Diyarbakir |
Period | 9/09/20 → 10/09/20 |
Bibliographical note
Publisher Copyright:© 2020 IEEE.
Keywords
- Contextual embeddings
- Data augmentation
- Deep learning
- Word sense disambiguation