Annotation and extraction of multiword expressions in Turkish treebanks

Gülşen Eryiǧit, Kübra Adalı, Dilara Torunoglu-Selamet, Umut Sulubacak, Tugba Pamay

Araştırma sonucu: ???type-name???Konferans katkısıbilirkişi

8 Atıf (Scopus)

Özet

Multiword expressions (MWEs) present particular and distinctive semantic properties, hence their automatic extraction receives special attention from the natural language processing (NLP) and corpus linguistics community, and is still an active research area. Unfortunately, the creation of necessary resources for this task is quite rigorous and many languages suffer from the lack of these; as in the case for Turkish. This study presents our MWE annotations on recently introduced Turkish Treebanks, which focuses on annotating various types of linguistic units and expressions, including named entities, numerical expressions, idiomatic phrases, verb phrases with auxiliaries and duplications. The paper aims to provide a benchmark and pave the way towards further MWE extraction research for Turkish. To this end, the paper also introduces our experimental results with seven baseline approaches, a dependency parser and a previously introduced rule-based extractor on these annotated corpora. Our highest performances achieved over these resources are about 60% F-scores.

Orijinal dilİngilizce
Ana bilgisayar yayını başlığı11th Workshop on Multiword Expressions, MWE 2015 - in conjunction with the 2015 Conference of the North American Chapter of the Association for Computational Linguistics
Ana bilgisayar yayını alt yazısıHuman Language Technologies, NAACL-HLT 2015
YayınlayanAssociation for Computational Linguistics (ACL)
Sayfalar70-76
Sayfa sayısı7
ISBN (Elektronik)9781941643389
Yayın durumuYayınlandı - 2015
Etkinlik11th Workshop on Multiword Expressions, MWE 2015 - Denver, United States
Süre: 4 Haz 2015 → …

Yayın serisi

Adı11th Workshop on Multiword Expressions, MWE 2015 - in conjunction with the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2015

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???11th Workshop on Multiword Expressions, MWE 2015
Ülke/BölgeUnited States
ŞehirDenver
Periyot4/06/15 → …

Bibliyografik not

Publisher Copyright:
© NAACL-HLT 2015.All right reserved.

Finansman

We would like to acknowledge that this work is part of a research project entitled “Parsing Web 2.0 Sentences” subsidized by the TUBITAK (Turkish Scientific and Technological Research Council) 1001 program (grant number 112E276) and part of the ICT COST Action IC1207 PARSEME (PARSing and Multi-word Expressions).

FinansörlerFinansör numarası
TUBITAK
Turkish Scientific and Technological Research Council112E276
European Cooperation in Science and Technology

    Parmak izi

    Annotation and extraction of multiword expressions in Turkish treebanks' araştırma başlıklarına git. Birlikte benzersiz bir parmak izi oluştururlar.

    Alıntı Yap