Abstract
With the growth of the social web, usergenerated text data has reached unprecedented sizes. Non-canonical text normalization provides a way to exploit this as a practical source of training data for language processing systems. The state of the art in Turkish text normalization is composed of a tokenlevel pipeline of modules, heavily dependent on external linguistic resources and manuallydefined rules. Instead, we propose a fullyautomated, context-aware machine translation approach with fewer stages of processing. Experiments with various implementations of our approach show that we are able to surpass the current best-performing system by a large margin.
Original language | English |
---|---|
Title of host publication | ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Student Research Workshop |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 267-272 |
Number of pages | 6 |
ISBN (Electronic) | 9781950737475 |
Publication status | Published - 2019 |
Event | 57th Annual Meeting of the Association for Computational Linguistics, ACL 2019 - Student Research Workshop, SRW 2019 - Florence, Italy Duration: 28 Jul 2019 → 2 Aug 2019 |
Publication series
Name | ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Student Research Workshop |
---|
Conference
Conference | 57th Annual Meeting of the Association for Computational Linguistics, ACL 2019 - Student Research Workshop, SRW 2019 |
---|---|
Country/Territory | Italy |
City | Florence |
Period | 28/07/19 → 2/08/19 |
Bibliographical note
Publisher Copyright:© 2019 Association for Computational Linguistics.