ViBERTgrid BiLSTM-CRF: Multimodal Key Information Extraction from Unstructured Financial Documents

Furkan Pala*, Mehmet Yasin Akpınar, Onur Deniz, Gülşen Eryiğit

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Multimodal key information extraction (KIE) models have been studied extensively on semi-structured documents. However, their investigation on unstructured documents is an emerging research topic. The paper presents an approach to adapt a multimodal transformer (i.e., ViBERTgrid previously explored on semi-structured documents) for unstructured financial documents, by incorporating a BiLSTM-CRF layer. The proposed ViBERTgrid BiLSTM-CRF model demonstrates a significant improvement in performance (up to 2% points) on named entity recognition from unstructured documents in financial domain, while maintaining its KIE performance on semi-structured documents. As an additional contribution, we publicly released token-level annotations for the SROIE dataset in order to pave the way for its use in multimodal sequence labeling models.

Original languageEnglish
Title of host publicationMachine Learning and Principles and Practice of Knowledge Discovery in Databases - International Workshops of ECML PKDD 2023, Revised Selected Papers
EditorsRosa Meo, Fabrizio Silvestri
PublisherSpringer Science and Business Media Deutschland GmbH
Pages307-322
Number of pages16
ISBN (Print)9783031746420
DOIs
Publication statusPublished - 2025
EventJoint European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2023 - Turin, Italy
Duration: 18 Sept 202322 Sept 2023

Publication series

NameCommunications in Computer and Information Science
Volume2137 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

ConferenceJoint European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2023
Country/TerritoryItaly
CityTurin
Period18/09/2322/09/23

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

Keywords

  • Multimodal Information Extraction
  • Natural Language Processing
  • Unstructured Financial Documents

Fingerprint

Dive into the research topics of 'ViBERTgrid BiLSTM-CRF: Multimodal Key Information Extraction from Unstructured Financial Documents'. Together they form a unique fingerprint.

Cite this