CVOct 14, 2024

ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-training

Zhouqiang Jiang, Bowen Wang, Junhao Chen, Yuta Nakashima

arXiv:2410.10471v219.421 citationsh-index: 14COLING

Originality Incremental advance

AI Analysis

This addresses the unrealistic reliance on manual annotations in document understanding for practical applications, representing an incremental improvement.

The paper tackles the problem of visually-rich document understanding by introducing a new real-world variant (ReVrDU) that disallows manually annotated semantic groups, and proposes ReLayout, a method that learns semantic grouping by arranging words, showing superior performance over existing methods.

Recent approaches for visually-rich document understanding (VrDU) uses manually annotated semantic groups, where a semantic group encompasses all semantically relevant but not obviously grouped words. As OCR tools are unable to automatically identify such grouping, we argue that current VrDU approaches are unrealistic. We thus introduce a new variant of the VrDU task, real-world visually-rich document understanding (ReVrDU), that does not allow for using manually annotated semantic groups. We also propose a new method, ReLayout, compliant with the ReVrDU scenario, which learns to capture semantic grouping through arranging words and bringing the representations of words that belong to the potential same semantic group closer together. Our experimental results demonstrate the performance of existing methods is deteriorated with the ReVrDU task, while ReLayout shows superiour performance.

View on arXiv PDF

Similar