Ana gezinime geç Aramaya geç Ana içeriğe geç

Towards Turkish Word Embeddings: An Intrinsic Evaluation

  • Oguz Ali Arslan
  • , Berfin Duman
  • , Hakan Erdem
  • , Can Gunyel
  • , Bike Sonmez
  • , Dogukan Arslan
  • Istanbul Technical University

Araştırma sonucu: Kitap/Rapor/Konferans Bildirisinde BölümKonferans katkısıbilirkişi

2 Atıf (Scopus)

Özet

Effective representation of textual data is a prereq-uisite for most of the downstream tasks, which increases the importance of word embedding evaluation methods. The intrinsic approach assesses the similarity between word representations and human judgements. In this paper, we present a compre-hensive intrinsic evaluation of Turkish word embedding models with different tasks using task-specific datasets such as SemEval-2017, MC-30, SimVerb-3500 for word similarity, MSR for word analogy and methods that have not been tested for Turkish before such as oncept categorization with BLESS and ESSLLI and outlier detection with 8-8-8 Dataset. While each of these datasets were originally in English, we translated them into Turkish and trained Wor2Vec, FastText and Glove language models with these datasets from scratch. The results suggest that while Word2Vec is generally more successful in word similarity and outlier detection tasks, fastText outperforms other models in word analogy and concept categorization.

Orijinal dilİngilizce
Ana bilgisayar yayını başlığıUBMK 2023 - Proceedings
Ana bilgisayar yayını alt yazısı8th International Conference on Computer Science and Engineering
YayınlayanInstitute of Electrical and Electronics Engineers Inc.
Sayfalar564-568
Sayfa sayısı5
ISBN (Elektronik)9798350340815
DOI'lar
Yayın durumuYayınlandı - 2023
Etkinlik8th International Conference on Computer Science and Engineering, UBMK 2023 - Burdur, Türkiye
Süre: 13 Eyl 202315 Eyl 2023

Yayın serisi

AdıUBMK 2023 - Proceedings: 8th International Conference on Computer Science and Engineering

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???8th International Conference on Computer Science and Engineering, UBMK 2023
Ülke/BölgeTürkiye
ŞehirBurdur
Periyot13/09/2315/09/23

Bibliyografik not

Publisher Copyright:
© 2023 IEEE.

Parmak izi

Towards Turkish Word Embeddings: An Intrinsic Evaluation' araştırma başlıklarına git. Birlikte benzersiz bir parmak izi oluştururlar.

Alıntı Yap