Özet
Multimodal key information extraction (KIE) models have been studied extensively on semi-structured documents. However, their investigation on unstructured documents is an emerging research topic. The paper presents an approach to adapt a multimodal transformer (i.e., ViBERTgrid previously explored on semi-structured documents) for unstructured financial documents, by incorporating a BiLSTM-CRF layer. The proposed ViBERTgrid BiLSTM-CRF model demonstrates a significant improvement in performance (up to 2% points) on named entity recognition from unstructured documents in financial domain, while maintaining its KIE performance on semi-structured documents. As an additional contribution, we publicly released token-level annotations for the SROIE dataset in order to pave the way for its use in multimodal sequence labeling models.
Orijinal dil | İngilizce |
---|---|
Ana bilgisayar yayını başlığı | Machine Learning and Principles and Practice of Knowledge Discovery in Databases - International Workshops of ECML PKDD 2023, Revised Selected Papers |
Editörler | Rosa Meo, Fabrizio Silvestri |
Yayınlayan | Springer Science and Business Media Deutschland GmbH |
Sayfalar | 307-322 |
Sayfa sayısı | 16 |
ISBN (Basılı) | 9783031746420 |
DOI'lar | |
Yayın durumu | Yayınlandı - 2025 |
Etkinlik | Joint European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2023 - Turin, Italy Süre: 18 Eyl 2023 → 22 Eyl 2023 |
Yayın serisi
Adı | Communications in Computer and Information Science |
---|---|
Hacim | 2137 CCIS |
ISSN (Basılı) | 1865-0929 |
ISSN (Elektronik) | 1865-0937 |
???event.eventtypes.event.conference???
???event.eventtypes.event.conference??? | Joint European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2023 |
---|---|
Ülke/Bölge | Italy |
Şehir | Turin |
Periyot | 18/09/23 → 22/09/23 |
Bibliyografik not
Publisher Copyright:© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.