Named entity recognition on real data: A preliminary investigation for Turkish

Gokhan Celikkaya, Dilara Torunoglu, Gulsen Eryigit

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

32 Citations (Scopus)

Abstract

Named Entity Recognition (NER) is a well-studied area in natural language processing (NLP) and the reported results in the literature are generally very high (∼>%95) for most of the languages. Today, the focus area of most practical natural language applications (i.e. web mining, sentiment analysis, machine translation) is real natural language data such as Web2.0 or speech data. Nevertheless, the NER task is rarely investigated on this type of data which differs severely from formal written text. In this paper, we present 3 new Turkish data sets from different domains (on this focused area; namely from Twitter, a Speech-to-Text Interface and a Hardware Forum) annotated specifically for NER and report our first results on them. We believe, the paper draws light to the difficulty of these new domains for NER and the possible future work.

Original languageEnglish
Title of host publicationAICT 2013 - 7th International Conference on Application of Information and Communication Technologies, Conference Proceedings
PublisherIEEE Computer Society
ISBN (Print)9781467364201
DOIs
Publication statusPublished - 2013
Event7th International Conference on Application of Information and Communication Technologies, AICT 2013 - Baku, Azerbaijan
Duration: 23 Oct 201325 Oct 2013

Publication series

NameAICT 2013 - 7th International Conference on Application of Information and Communication Technologies, Conference Proceedings

Conference

Conference7th International Conference on Application of Information and Communication Technologies, AICT 2013
Country/TerritoryAzerbaijan
CityBaku
Period23/10/1325/10/13

Keywords

  • Conditional Random Fields
  • ENAMEX
  • Named Entity Recognition
  • Speech Data
  • Turkish
  • Twitter

Fingerprint

Dive into the research topics of 'Named entity recognition on real data: A preliminary investigation for Turkish'. Together they form a unique fingerprint.

Cite this