ViBERTgrid BiLSTM-CRF: Multimodal Key Information Extraction from Unstructured Financial Documents

Furkan Pala*, Mehmet Yasin Akpınar, Onur Deniz, Gülşen Eryiğit

*Bu çalışma için yazışmadan sorumlu yazar

Araştırma sonucu: Kitap/Rapor/Konferans Bildirisinde BölümKonferans katkısıbilirkişi

Özet

Multimodal key information extraction (KIE) models have been studied extensively on semi-structured documents. However, their investigation on unstructured documents is an emerging research topic. The paper presents an approach to adapt a multimodal transformer (i.e., ViBERTgrid previously explored on semi-structured documents) for unstructured financial documents, by incorporating a BiLSTM-CRF layer. The proposed ViBERTgrid BiLSTM-CRF model demonstrates a significant improvement in performance (up to 2% points) on named entity recognition from unstructured documents in financial domain, while maintaining its KIE performance on semi-structured documents. As an additional contribution, we publicly released token-level annotations for the SROIE dataset in order to pave the way for its use in multimodal sequence labeling models.

Orijinal dilİngilizce
Ana bilgisayar yayını başlığıMachine Learning and Principles and Practice of Knowledge Discovery in Databases - International Workshops of ECML PKDD 2023, Revised Selected Papers
EditörlerRosa Meo, Fabrizio Silvestri
YayınlayanSpringer Science and Business Media Deutschland GmbH
Sayfalar307-322
Sayfa sayısı16
ISBN (Basılı)9783031746420
DOI'lar
Yayın durumuYayınlandı - 2025
EtkinlikJoint European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2023 - Turin, Italy
Süre: 18 Eyl 202322 Eyl 2023

Yayın serisi

AdıCommunications in Computer and Information Science
Hacim2137 CCIS
ISSN (Basılı)1865-0929
ISSN (Elektronik)1865-0937

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???Joint European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2023
Ülke/BölgeItaly
ŞehirTurin
Periyot18/09/2322/09/23

Bibliyografik not

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

Parmak izi

ViBERTgrid BiLSTM-CRF: Multimodal Key Information Extraction from Unstructured Financial Documents' araştırma başlıklarına git. Birlikte benzersiz bir parmak izi oluştururlar.

Alıntı Yap