Abstract
With the growth of the social web, usergenerated text data has reached unprecedented sizes. Non-canonical text normalization provides a way to exploit this as a practical source of training data for language processing systems. The state of the art in Turkish text normalization is composed of a tokenlevel pipeline of modules, heavily dependent on external linguistic resources and manuallydefined rules. Instead, we propose a fullyautomated, context-aware machine translation approach with fewer stages of processing. Experiments with various implementations of our approach show that we are able to surpass the current best-performing system by a large margin.
| Original language | English |
|---|---|
| Title of host publication | ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Student Research Workshop |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 267-272 |
| Number of pages | 6 |
| ISBN (Electronic) | 9781950737475 |
| Publication status | Published - 2019 |
| Event | 57th Annual Meeting of the Association for Computational Linguistics, ACL 2019 - Student Research Workshop, SRW 2019 - Florence, Italy Duration: 28 Jul 2019 → 2 Aug 2019 |
Publication series
| Name | ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Student Research Workshop |
|---|
Conference
| Conference | 57th Annual Meeting of the Association for Computational Linguistics, ACL 2019 - Student Research Workshop, SRW 2019 |
|---|---|
| Country/Territory | Italy |
| City | Florence |
| Period | 28/07/19 → 2/08/19 |
Bibliographical note
Publisher Copyright:© 2019 Association for Computational Linguistics.
Funding
The authors would like to thank Yves Scherrer for his valuable insights, and the Faculty of Arts at the University of Helsinki for funding a research visit, during which this study has materialized.
| Funders | Funder number |
|---|---|
| Yves Scherrer | |
| Helsingin Yliopisto |