Abstract
Multimodal key information extraction (KIE) models have been studied extensively on semi-structured documents. However, their investigation on unstructured documents is an emerging research topic. The paper presents an approach to adapt a multimodal transformer (i.e., ViBERTgrid previously explored on semi-structured documents) for unstructured financial documents, by incorporating a BiLSTM-CRF layer. The proposed ViBERTgrid BiLSTM-CRF model demonstrates a significant improvement in performance (up to 2% points) on named entity recognition from unstructured documents in financial domain, while maintaining its KIE performance on semi-structured documents. As an additional contribution, we publicly released token-level annotations for the SROIE dataset in order to pave the way for its use in multimodal sequence labeling models.
Original language | English |
---|---|
Title of host publication | Machine Learning and Principles and Practice of Knowledge Discovery in Databases - International Workshops of ECML PKDD 2023, Revised Selected Papers |
Editors | Rosa Meo, Fabrizio Silvestri |
Publisher | Springer Science and Business Media Deutschland GmbH |
Pages | 307-322 |
Number of pages | 16 |
ISBN (Print) | 9783031746420 |
DOIs | |
Publication status | Published - 2025 |
Event | Joint European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2023 - Turin, Italy Duration: 18 Sept 2023 → 22 Sept 2023 |
Publication series
Name | Communications in Computer and Information Science |
---|---|
Volume | 2137 CCIS |
ISSN (Print) | 1865-0929 |
ISSN (Electronic) | 1865-0937 |
Conference
Conference | Joint European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2023 |
---|---|
Country/Territory | Italy |
City | Turin |
Period | 18/09/23 → 22/09/23 |
Bibliographical note
Publisher Copyright:© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
Keywords
- Multimodal Information Extraction
- Natural Language Processing
- Unstructured Financial Documents