ROPE: Reading Order Equivariant Positional Encoding for Graph-based Document Information Extraction
This work addresses the challenge of extracting structured information from form-like documents, which is incremental as it enhances existing GCN methods for a specific domain.
The paper tackled the problem of capturing natural reading orders of words in document information extraction by proposing ROPE, a new positional encoding technique for graph-based models, which improved existing GCNs by up to 8.4% F1-score on tasks like word labeling and grouping.
Natural reading orders of words are crucial for information extraction from form-like documents. Despite recent advances in Graph Convolutional Networks (GCNs) on modeling spatial layout patterns of documents, they have limited ability to capture reading orders of given word-level node representations in a graph. We propose Reading Order Equivariant Positional Encoding (ROPE), a new positional encoding technique designed to apprehend the sequential presentation of words in documents. ROPE generates unique reading order codes for neighboring words relative to the target word given a word-level graph connectivity. We study two fundamental document entity extraction tasks including word labeling and word grouping on the public FUNSD dataset and a large-scale payment dataset. We show that ROPE consistently improves existing GCNs with a margin up to 8.4% F1-score.