Abstract
Optical Character Recognition (OCR) is the process of extracting the texts from the images by means of some special programs and transferring them to the computer environment. OCR quality directly affects the quality of most natural language processing processes. Many applications such as text classification, information extraction, text summarization with texts extracted from images are used in daily life. Therefore, detecting and correcting incorrectly translated texts after OCR is a topic that researchers are working on with many methods today. In this study, it is aimed to apply and observe the results on the dataset presented in the International Conference on Document Analysis and Recognition (ICDAR) 2019 OCR Post Error Detection and Correction competition, using the latest neural machine translation methods to find and correct post-OCR text errors.
Translated title of the contribution | Neural Machine Translation Approaches for Post-OCR Text Processing |
---|---|
Original language | Turkish |
Title of host publication | 2022 30th Signal Processing and Communications Applications Conference, SIU 2022 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9781665450928 |
DOIs | |
Publication status | Published - 2022 |
Event | 30th Signal Processing and Communications Applications Conference, SIU 2022 - Safranbolu, Turkey Duration: 15 May 2022 → 18 May 2022 |
Publication series
Name | 2022 30th Signal Processing and Communications Applications Conference, SIU 2022 |
---|
Conference
Conference | 30th Signal Processing and Communications Applications Conference, SIU 2022 |
---|---|
Country/Territory | Turkey |
City | Safranbolu |
Period | 15/05/22 → 18/05/22 |
Bibliographical note
Publisher Copyright:© 2022 IEEE.