CLAILGMar 10, 2021

DeepCPCFG: Deep Learning and Context Free Grammars for End-to-End Information Extraction

arXiv:2103.05908v27 citations
Originality Highly original
AI Analysis

This addresses the challenge of automating information extraction from complex documents like invoices, reducing the need for costly manual annotations.

The paper tackles the problem of extracting structured information from business documents without detailed annotations by proposing DeepCPCFG, an end-to-end system that uses deep learning and context-free grammars, achieving state-of-the-art results on scanned invoices.

We address the challenge of extracting structured information from business documents without detailed annotations. We propose Deep Conditional Probabilistic Context Free Grammars (DeepCPCFG) to parse two-dimensional complex documents and use Recursive Neural Networks to create an end-to-end system for finding the most probable parse that represents the structured information to be extracted. This system is trained end-to-end with scanned documents as input and only relational-records as labels. The relational-records are extracted from existing databases avoiding the cost of annotating documents by hand. We apply this approach to extract information from scanned invoices achieving state-of-the-art results despite using no hand-annotations.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes