Preliminary investigation on using semi-supervised contextual word sense disambiguation for data augmentation

Dilara Torunoglu-Selamet, Arda Inceoglu, Gulsen Eryigit

Araştırma sonucu: ???type-name???Konferans katkısıbilirkişi

2 Atıf (Scopus)

Özet

Recently, neural architectures play a significant role in the task of Word Sense Disambiguation (WSD). Supervised methods seem to be ahead of its rivals and their performance mostly depends on the size of training data. A numerous number of human-annotated data available for WSD task have been constructed for English. However, low-resource languages (LRLs) still face difficulty in finding suitable data resources. Gathering and annotating a sufficient amount of training data is a time-consuming and labor-expensive work. To address and overcome this problem, in this paper we investigate the possibility of using a semi-supervised context based WSD approach for data augmentation (in order to be later used for supervised learning). Since, it is even difficult to find WSD evaluation datasets for LRLs, in this study, we use English datasets to build a proof-of-concept and to evaluate their applicability onto LRLs. Our semi-supervised approach uses a seed set and context embeddings. We test with 9 different context based language models (including ELMo, BERT, RoBERTa etc.) and investigate their impacts on WSD. We increased our baseline results up to 28 percentage point improvements (baseline with ELMo 50.39% and ELMo Sense Seed Based Average Similarity Model 78.06%) in terms of accuracy. Our initial findings reveal that the proposed approach is very promising for the augmentation of WSD datasets of LRLs.

Orijinal dilİngilizce
Ana bilgisayar yayını başlığı5th International Conference on Computer Science and Engineering, UBMK 2020
YayınlayanInstitute of Electrical and Electronics Engineers Inc.
Sayfalar337-342
Sayfa sayısı6
ISBN (Elektronik)9781728175652
DOI'lar
Yayın durumuYayınlandı - Eyl 2020
Etkinlik5th International Conference on Computer Science and Engineering, UBMK 2020 - Diyarbakir, Turkey
Süre: 9 Eyl 202010 Eyl 2020

Yayın serisi

Adı5th International Conference on Computer Science and Engineering, UBMK 2020

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???5th International Conference on Computer Science and Engineering, UBMK 2020
Ülke/BölgeTurkey
ŞehirDiyarbakir
Periyot9/09/2010/09/20

Bibliyografik not

Publisher Copyright:
© 2020 IEEE.

Finansman

This work is part of a research project supported by ITU Scientifi Research Projects Grant no: MDK-2017-40968. And Arda Inceoglu was supported by the Turkcell-Istanbul Technical University Researcher Funding Program.

FinansörlerFinansör numarası
Turkcell-Istanbul Technical University
International Technological UniversityMDK-2017-40968

    Parmak izi

    Preliminary investigation on using semi-supervised contextual word sense disambiguation for data augmentation' araştırma başlıklarına git. Birlikte benzersiz bir parmak izi oluştururlar.

    Alıntı Yap