Abstract
Multimodal key information extraction (KIE) models have been studied extensively on semi-structured documents. However, their investigation on unstructured documents is an emerging research topic. The paper presents an approach to adapt a multimodal transformer (i.e., ViBERTgrid previously explored on semi-structured documents) for unstructured financial documents, by incorporating a BiLSTM-CRF layer. The proposed ViBERTgrid BiLSTM-CRF model demonstrates a significant improvement in performance (up to 2% points) on named entity recognition from unstructured documents in financial domain, while maintaining its KIE performance on semi-structured documents. As an additional contribution, we publicly released token-level annotations for the SROIE dataset in order to pave the way for its use in multimodal sequence labeling models.
| Original language | English |
|---|---|
| Title of host publication | Machine Learning and Principles and Practice of Knowledge Discovery in Databases - International Workshops of ECML PKDD 2023, Revised Selected Papers |
| Editors | Rosa Meo, Fabrizio Silvestri |
| Publisher | Springer Science and Business Media Deutschland GmbH |
| Pages | 307-322 |
| Number of pages | 16 |
| ISBN (Print) | 9783031746420 |
| DOIs | |
| Publication status | Published - 2025 |
| Event | Joint European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2023 - Turin, Italy Duration: 18 Sept 2023 → 22 Sept 2023 |
Publication series
| Name | Communications in Computer and Information Science |
|---|---|
| Volume | 2137 CCIS |
| ISSN (Print) | 1865-0929 |
| ISSN (Electronic) | 1865-0937 |
Conference
| Conference | Joint European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2023 |
|---|---|
| Country/Territory | Italy |
| City | Turin |
| Period | 18/09/23 → 22/09/23 |
Bibliographical note
Publisher Copyright:© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
Keywords
- Multimodal Information Extraction
- Natural Language Processing
- Unstructured Financial Documents