AICLCVIRSep 23, 2024

ViBERTgrid BiLSTM-CRF: Multimodal Key Information Extraction from Unstructured Financial Documents

arXiv:2409.15004v12 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses the problem of extracting key information from unstructured financial documents for domain-specific applications, representing an incremental advancement.

The paper tackled key information extraction from unstructured financial documents by adapting a multimodal transformer with a BiLSTM-CRF layer, achieving up to a 2 percentage point improvement in named entity recognition performance.

Multimodal key information extraction (KIE) models have been studied extensively on semi-structured documents. However, their investigation on unstructured documents is an emerging research topic. The paper presents an approach to adapt a multimodal transformer (i.e., ViBERTgrid previously explored on semi-structured documents) for unstructured financial documents, by incorporating a BiLSTM-CRF layer. The proposed ViBERTgrid BiLSTM-CRF model demonstrates a significant improvement in performance (up to 2 percentage points) on named entity recognition from unstructured documents in financial domain, while maintaining its KIE performance on semi-structured documents. As an additional contribution, we publicly released token-level annotations for the SROIE dataset in order to pave the way for its use in multimodal sequence labeling models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes