Özet
In this paper, we investigated the impact of extracting different types of multiword expressions (MWEs) in improving the accuracy of a data-driven dependency parser for a morphologically rich language (Turkish). We showed that in the training stage, the unification of MWEs of a certain type, namely compound verb and noun formations, has a negative effect on parsing accuracy by increasing the lexical sparsity. Our results gave a statistically significant improvement by using a variant of the treebank excluding this MWE type in the training stage. Our extrinsic evaluation of an ideal MWE recognizer (for only extracting MWEs of type named entities, duplications, numbers, dates and some predefined list of compound prepositions) showed that the preprocessing of the test data would improve the labeled parsing accuracy by 1.5%.
Orijinal dil | İngilizce |
---|---|
Ana bilgisayar yayını başlığı | 2nd Workshop on Statistical Parsing of Morphologically Rich Languages, SPMRL 2011 - collocate with the International Workshop on Parsing Technologies, IWPT 2011 - Proceedings |
Editörler | Djame Seddah, Reut Tsarfaty, Jennifer Foster |
Yayınlayan | Association for Computational Linguistics (ACL) |
Sayfalar | 45-55 |
Sayfa sayısı | 11 |
ISBN (Elektronik) | 9781932432732 |
Yayın durumu | Yayınlandı - 2011 |
Etkinlik | 2nd Workshop on Statistical Parsing of Morphologically Rich Languages, SPMRL 2011 - Dublin, Ireland Süre: 6 Eki 2011 → … |
Yayın serisi
Adı | 2nd Workshop on Statistical Parsing of Morphologically Rich Languages, SPMRL 2011 - collocate with the International Workshop on Parsing Technologies, IWPT 2011 - Proceedings |
---|
???event.eventtypes.event.conference???
???event.eventtypes.event.conference??? | 2nd Workshop on Statistical Parsing of Morphologically Rich Languages, SPMRL 2011 |
---|---|
Ülke/Bölge | Ireland |
Şehir | Dublin |
Periyot | 6/10/11 → … |
Bibliyografik not
Publisher Copyright:© 2011 Association for Computational Linguistics