Freddy C. Chua

1paper

1 Paper

CLMar 10, 2021
DeepCPCFG: Deep Learning and Context Free Grammars for End-to-End Information Extraction

Freddy C. Chua, Nigel P. Duffy

We address the challenge of extracting structured information from business documents without detailed annotations. We propose Deep Conditional Probabilistic Context Free Grammars (DeepCPCFG) to parse two-dimensional complex documents and use Recursive Neural Networks to create an end-to-end system for finding the most probable parse that represents the structured information to be extracted. This system is trained end-to-end with scanned documents as input and only relational-records as labels. The relational-records are extracted from existing databases avoiding the cost of annotating documents by hand. We apply this approach to extract information from scanned invoices achieving state-of-the-art results despite using no hand-annotations.