AI CL CV IRSep 23, 2024

ViBERTgrid BiLSTM-CRF: Multimodal Key Information Extraction from Unstructured Financial Documents

Furkan Pala, Mehmet Yasin Akpınar, Onur Deniz, Gülşen Eryiğit

arXiv:2409.15004v12.32 citationsh-index: 4

Originality Incremental advance

AI Analysis

This addresses the problem of extracting key information from unstructured financial documents for domain-specific applications, representing an incremental advancement.

The paper tackled key information extraction from unstructured financial documents by adapting a multimodal transformer with a BiLSTM-CRF layer, achieving up to a 2 percentage point improvement in named entity recognition performance.

Multimodal key information extraction (KIE) models have been studied extensively on semi-structured documents. However, their investigation on unstructured documents is an emerging research topic. The paper presents an approach to adapt a multimodal transformer (i.e., ViBERTgrid previously explored on semi-structured documents) for unstructured financial documents, by incorporating a BiLSTM-CRF layer. The proposed ViBERTgrid BiLSTM-CRF model demonstrates a significant improvement in performance (up to 2 percentage points) on named entity recognition from unstructured documents in financial domain, while maintaining its KIE performance on semi-structured documents. As an additional contribution, we publicly released token-level annotations for the SROIE dataset in order to pave the way for its use in multimodal sequence labeling models.

View on arXiv PDF

Similar