Client-Driven Content Extraction Associated with Table
This addresses the need for customizable table extraction in document processing for clients, but it appears incremental as it builds on existing graph-based methods.
The paper tackles the problem of extracting content from tables in document images by learning patterns based on client-provided key fields, representing them as attributed relational graphs to mine similar graphs from images, and validating the concept with a real-world industrial problem.
The goal of the project is to extract content within table in document images based on learnt patterns. Real-world users i.e., clients first provide a set of key fields within the table which they think are important. These are first used to represent the graph where nodes are labelled with semantics including other features and edges are attributed with relations. Attributed relational graph (ARG) is then employed to mine similar graphs from a document image. Each mined graph will represent an item within the table, and hence a set of such graphs will compose a table. We have validated the concept by using a real-world industrial problem.