Abstract
In this paper, we investigated the impact of extracting different types of multiword expressions (MWEs) in improving the accuracy of a data-driven dependency parser for a morphologically rich language (Turkish). We showed that in the training stage, the unification of MWEs of a certain type, namely compound verb and noun formations, has a negative effect on parsing accuracy by increasing the lexical sparsity. Our results gave a statistically significant improvement by using a variant of the treebank excluding this MWE type in the training stage. Our extrinsic evaluation of an ideal MWE recognizer (for only extracting MWEs of type named entities, duplications, numbers, dates and some predefined list of compound prepositions) showed that the preprocessing of the test data would improve the labeled parsing accuracy by 1.5%.
Original language | English |
---|---|
Title of host publication | 2nd Workshop on Statistical Parsing of Morphologically Rich Languages, SPMRL 2011 - collocate with the International Workshop on Parsing Technologies, IWPT 2011 - Proceedings |
Editors | Djame Seddah, Reut Tsarfaty, Jennifer Foster |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 45-55 |
Number of pages | 11 |
ISBN (Electronic) | 9781932432732 |
Publication status | Published - 2011 |
Event | 2nd Workshop on Statistical Parsing of Morphologically Rich Languages, SPMRL 2011 - Dublin, Ireland Duration: 6 Oct 2011 → … |
Publication series
Name | 2nd Workshop on Statistical Parsing of Morphologically Rich Languages, SPMRL 2011 - collocate with the International Workshop on Parsing Technologies, IWPT 2011 - Proceedings |
---|
Conference
Conference | 2nd Workshop on Statistical Parsing of Morphologically Rich Languages, SPMRL 2011 |
---|---|
Country/Territory | Ireland |
City | Dublin |
Period | 6/10/11 → … |
Bibliographical note
Publisher Copyright:© 2011 Association for Computational Linguistics