The annotation process of the ITU web treebank

Tuğba Pamay, Umut Sulubacak, Dilara Torunoğlu-Selamet, Gülşen Eryiğit

Araştırma sonucu: ???type-name???Konferans katkısıbilirkişi

11 Alıntılar (SciVal)

Özet

The potential of processing user-generated texts freely available on the web is widely recognized, but due to the non-canonical nature of the language used in the web, it is not possible to process these data using conventional methodologies designed for well-edited formal texts. Procedures for properly annotating raw web data have not been as extensively researched as those for annotating well-edited texts, as also evident from the viewpoint of Turkish language processing. Moreover, there is a considerable shortage of human-annotated corpora derived from Turkish web data. The ITU Web Treebank is the first attempt for a diverse corpus compiled from Turkish texts found on the web. In this paper, we first present our survey of the non-canonical aspects of the language used in the Turkish web. Next, we discuss in detail the annotation procedure followed in the ITU Web Treebank, revised for compatibility with the language of the web. Finally, we describe the web-based annotation tool following this procedure, on which the treebank was annotated.

Orijinal dilİngilizce
Ana bilgisayar yayını başlığıLAW 2015 - 9th Linguistic Annotation Workshop, held in conjuncion with NAACL 2015 - Proceedings of the Workshop
EditörlerAdam Meyers, Ines Rehbein, Heike Zinsmeister
YayınlayanAssociation for Computational Linguistics (ACL)
Sayfalar95-101
Sayfa sayısı7
ISBN (Elektronik)9781941643471
Yayın durumuYayınlandı - 2020
Etkinlik9th Linguistic Annotation Workshop, LAW 2015, held in conjuncion with NAACL 2015 - Denver, United States
Süre: 5 Haz 2015 → …

Yayın serisi

AdıLAW 2015 - 9th Linguistic Annotation Workshop, held in conjuncion with NAACL 2015 - Proceedings of the Workshop

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???9th Linguistic Annotation Workshop, LAW 2015, held in conjuncion with NAACL 2015
Ülke/BölgeUnited States
ŞehirDenver
Periyot5/06/15 → …

Bibliyografik not

Publisher Copyright:
© 2015 Association for Computational Linguistics

Finansman

We would like to acknowledge that this work is part of a research project entitled “Parsing Web 2.0 Sentences” subsidized by the TÜB˙TAK (Turkish Scientific and Technological Research Council) 1001 program (grant number 112E276) and part of the ICT COST Action IC1207. We would hereby like to offer our sincere gratitude to our colleagues Ays¸enur Genc¸, Can Özbey, Kübra Adalı and Gözde Gül S¸ ahin who offered additional help during the annotation phase. We would like to acknowledge that this work is part of a research project entitled ?Parsing Web 2.0 Sentences? subsidized by the T?BITAK (Turkish Scientific and Technological Research Council) 1001 program (grant number 112E276) and part of the ICT COST Action IC1207. We would hereby like to offer our sincere gratitude to our colleagues Ay?enur Gen?, Can ?zbey, K?bra Adal? and G?zde G?l ?ahin who offered additional help during the annotation phase.

FinansörlerFinansör numarası
T?BITAK
European Cooperation in Science and TechnologyIC1207
Consejo Nacional para Investigaciones Científicas y Tecnológicas112E276

    Parmak izi

    The annotation process of the ITU web treebank' araştırma başlıklarına git. Birlikte benzersiz bir parmak izi oluştururlar.

    Alıntı Yap