CLDec 31, 2019

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

arXiv:1912.13318v51011 citations
Originality Highly original
AI Analysis

This addresses the need for better information extraction from scanned documents, which is crucial for real-world applications like form and receipt processing, and it introduces a novel approach by integrating layout with text for the first time in pre-training.

The paper tackles the problem of document image understanding by proposing LayoutLM, a model that jointly pre-trains text and layout information, achieving state-of-the-art results with improvements such as form understanding from 70.72 to 79.27, receipt understanding from 94.02 to 95.24, and document image classification from 93.07 to 94.42.

Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread use of pre-training models for NLP applications, they almost exclusively focus on text-level manipulation, while neglecting layout and style information that is vital for document image understanding. In this paper, we propose the \textbf{LayoutLM} to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents. Furthermore, we also leverage image features to incorporate words' visual information into LayoutLM. To the best of our knowledge, this is the first time that text and layout are jointly learned in a single framework for document-level pre-training. It achieves new state-of-the-art results in several downstream tasks, including form understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24) and document image classification (from 93.07 to 94.42). The code and pre-trained LayoutLM models are publicly available at \url{https://aka.ms/layoutlm}.

Code Implementations19 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes