Normalizing Non-canonical Turkish texts using machine translation approaches

Talha Çolakoǧlu, Umut Sulubacak, A. Cuneyd Tantug

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Citations (Scopus)

Abstract

With the growth of the social web, usergenerated text data has reached unprecedented sizes. Non-canonical text normalization provides a way to exploit this as a practical source of training data for language processing systems. The state of the art in Turkish text normalization is composed of a tokenlevel pipeline of modules, heavily dependent on external linguistic resources and manuallydefined rules. Instead, we propose a fullyautomated, context-aware machine translation approach with fewer stages of processing. Experiments with various implementations of our approach show that we are able to surpass the current best-performing system by a large margin.

Original languageEnglish
Title of host publicationACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Student Research Workshop
PublisherAssociation for Computational Linguistics (ACL)
Pages267-272
Number of pages6
ISBN (Electronic)9781950737475
Publication statusPublished - 2019
Event57th Annual Meeting of the Association for Computational Linguistics, ACL 2019 - Student Research Workshop, SRW 2019 - Florence, Italy
Duration: 28 Jul 20192 Aug 2019

Publication series

NameACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Student Research Workshop

Conference

Conference57th Annual Meeting of the Association for Computational Linguistics, ACL 2019 - Student Research Workshop, SRW 2019
Country/TerritoryItaly
CityFlorence
Period28/07/192/08/19

Bibliographical note

Publisher Copyright:
© 2019 Association for Computational Linguistics.

Fingerprint

Dive into the research topics of 'Normalizing Non-canonical Turkish texts using machine translation approaches'. Together they form a unique fingerprint.

Cite this