Abstract
This paper reports the highest results (95% in MUC and 92% in CoNLL metric) in the literature for Turkish named entity recognition; more specifically for the task of detecting person, location and organization entities in general news texts. We give an in depth analysis of the previous reported results and make comparisons with them whenever possible. We use conditional random fields (CRFs) as our statistical model. The paper presents initial explorations on the usage of rich morphological structure of the Turkish language as features to CRFs together with the use of some basic and generative gazetteers.
Original language | English |
---|---|
Pages | 2459-2474 |
Number of pages | 16 |
Publication status | Published - 2012 |
Event | 24th International Conference on Computational Linguistics, COLING 2012 - Mumbai, India Duration: 8 Dec 2012 → 15 Dec 2012 |
Conference
Conference | 24th International Conference on Computational Linguistics, COLING 2012 |
---|---|
Country/Territory | India |
City | Mumbai |
Period | 8/12/12 → 15/12/12 |
Keywords
- Conditional random fields
- ENAMEX
- Named entity recognition
- Turkish