IRAIOct 16, 2020

A Conglomerate of Multiple OCR Table Detection and Extraction

arXiv:2010.08591v13 citations
Originality Synthesis-oriented
AI Analysis

This addresses a specific challenge in document processing for industries dealing with OCR data, but it appears incremental as it builds on existing techniques without claiming major breakthroughs.

The paper tackles the problem of detecting and extracting multiple tables from OCR documents or images, which is challenging for industry, and proposes an algorithm that combines image processing, text recognition, and procedural coding to map text to cells in dataframes for storage in various formats.

Information representation as tables are compact and concise method that eases searching, indexing, and storage requirements. Extracting and cloning tables from parsable documents is easier and widely used, however industry still faces challenge in detecting and extracting tables from OCR documents or images. This paper proposes an algorithm that detects and extracts multiple tables from OCR document. The algorithm uses a combination of image processing techniques, text recognition and procedural coding to identify distinct tables in same image and map the text to appropriate corresponding cell in dataframe which can be stored as Comma-separated values, Database, Excel and multiple other usable formats.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes