Document Classification and Key Information Extraction Using Multimodal Transformers

Mehmet Selman Baysan, Furkan Kizilay, Ayşe Irem Özmen, Gökhan Ince

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Companies manage and track their expenses either physically or through software applications. However, manual expense entry steps are prone to errors. Manual expense entry errors losses in terms of money, time and productivity. Therefore, this study presents a novel system on the automation of document information entry with a special focus on financial documents through machine I earning techniques. The methodology involves training LayoutLM models for sequence and token classification to categorize and extract detailed information from various financial documents such a s receipts and invoices. The proposed system integrates state-of-the-art models such as LayoutLMv2, LayoutLMv3, and fastText to achieve accurate document classification a nd information extraction. The designed system was implemented and tested on various types of receipts and invoices containing financial values, using evaluation metrics such as accuracy, precision, recall, and F1-score. The capability of the proposed system to achieve high accuracy, precision and F1 scores above 90 % across various document types and in automated document processing tasks reaffirms its suitability for document processing applications.

Original languageEnglish
Title of host publicationUBMK 2024 - Proceedings
Subtitle of host publication9th International Conference on Computer Science and Engineering
EditorsEsref Adali
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages276-281
Number of pages6
ISBN (Electronic)9798350365887
DOIs
Publication statusPublished - 2024
Event9th International Conference on Computer Science and Engineering, UBMK 2024 - Antalya, Turkey
Duration: 26 Oct 202428 Oct 2024

Publication series

NameUBMK 2024 - Proceedings: 9th International Conference on Computer Science and Engineering

Conference

Conference9th International Conference on Computer Science and Engineering, UBMK 2024
Country/TerritoryTurkey
CityAntalya
Period26/10/2428/10/24

Bibliographical note

Publisher Copyright:
© 2024 IEEE.

Keywords

  • Document Automation
  • Financial Document Processing
  • Key Information Extraction (KIE)
  • LayoutLM
  • Token Classification

Fingerprint

Dive into the research topics of 'Document Classification and Key Information Extraction Using Multimodal Transformers'. Together they form a unique fingerprint.

Cite this