Word Alignment for English Turkish Language Pair

M. Talha Cakmak, Süleyman Acar, Gülsen Eryigit

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Citations (Scopus)

Abstract

Word alignment is an important step for machine translation systems. Although the alignment performance between grammatically similar languages is reported to be very high in many studies, the case is not the same for language pairs from different language families. In this study, we are focusing on English-Turkish language pairs. Turkish is a highly agglutinative language with a very productive and rich morphology whereas English has a very poor morphology when compared to this language. As a result of this, one Turkish word is usually aligned with several English words. The traditional models which use word-level alignment approaches generally fail in such circumstances. In this study, we evaluate a Giza++ system by splitting the words into their morphological units (stem and suffixes) and compare the model with the traditional one. For the first time, we evaluate the performance of our aligner on gold standard parallel sentences rather than in a real machine translation system. Our approach reduced the alignment error rate by 40% relative. Finally, a new test corpus of 300 manually aligned sentences is released together with this study.

Original languageEnglish
Title of host publicationProceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012
EditorsMehmet Ugur Dogan, Joseph Mariani, Asuncion Moreno, Sara Goggi, Khalid Choukri, Nicoletta Calzolari, Jan Odijk, Thierry Declerck, Bente Maegaard, Stelios Piperidis, Helene Mazo, Olivier Hamon
PublisherEuropean Language Resources Association (ELRA)
Pages2177-2180
Number of pages4
ISBN (Electronic)9782951740877
Publication statusPublished - 2012
Event8th International Conference on Language Resources and Evaluation, LREC 2012 - Istanbul, Turkey
Duration: 21 May 201227 May 2012

Publication series

NameProceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012

Conference

Conference8th International Conference on Language Resources and Evaluation, LREC 2012
Country/TerritoryTurkey
CityIstanbul
Period21/05/1227/05/12

Keywords

  • Machine Translation
  • Turkish
  • Word Alignment

Fingerprint

Dive into the research topics of 'Word Alignment for English Turkish Language Pair'. Together they form a unique fingerprint.

Cite this