CLLGJun 21, 2021

ROPE: Reading Order Equivariant Positional Encoding for Graph-based Document Information Extraction

arXiv:2106.10786v1712 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of extracting structured information from form-like documents, which is incremental as it enhances existing GCN methods for a specific domain.

The paper tackled the problem of capturing natural reading orders of words in document information extraction by proposing ROPE, a new positional encoding technique for graph-based models, which improved existing GCNs by up to 8.4% F1-score on tasks like word labeling and grouping.

Natural reading orders of words are crucial for information extraction from form-like documents. Despite recent advances in Graph Convolutional Networks (GCNs) on modeling spatial layout patterns of documents, they have limited ability to capture reading orders of given word-level node representations in a graph. We propose Reading Order Equivariant Positional Encoding (ROPE), a new positional encoding technique designed to apprehend the sequential presentation of words in documents. ROPE generates unique reading order codes for neighboring words relative to the target word given a word-level graph connectivity. We study two fundamental document entity extraction tasks including word labeling and word grouping on the public FUNSD dataset and a large-scale payment dataset. We show that ROPE consistently improves existing GCNs with a margin up to 8.4% F1-score.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes