CVMar 17, 2022

Robust Table Detection and Structure Recognition from Heterogeneous Document Images

arXiv:2203.09056v273 citationsh-index: 32
AI Analysis

This work addresses the problem of accurately extracting tables from diverse document formats for applications in document analysis and data extraction, representing a strong specific gain rather than a foundational advancement.

The authors tackled table detection and structure recognition in heterogeneous document images by introducing RobusTabNet, which achieved state-of-the-art performance on multiple public benchmarks, including cTDaR TrackA, PubLayNet, IIIT-AR-13K, SciTSR, PubTabNet, and cTDaR TrackB2-Modern, using a lightweight ResNet-18 backbone.

We introduce a new table detection and structure recognition approach named RobusTabNet to detect the boundaries of tables and reconstruct the cellular structure of each table from heterogeneous document images. For table detection, we propose to use CornerNet as a new region proposal network to generate higher quality table proposals for Faster R-CNN, which has significantly improved the localization accuracy of Faster R-CNN for table detection. Consequently, our table detection approach achieves state-of-the-art performance on three public table detection benchmarks, namely cTDaR TrackA, PubLayNet and IIIT-AR-13K, by only using a lightweight ResNet-18 backbone network. Furthermore, we propose a new split-and-merge based table structure recognition approach, in which a novel spatial CNN based separation line prediction module is proposed to split each detected table into a grid of cells, and a Grid CNN based cell merging module is applied to recover the spanning cells. As the spatial CNN module can effectively propagate contextual information across the whole table image, our table structure recognizer can robustly recognize tables with large blank spaces and geometrically distorted (even curved) tables. Thanks to these two techniques, our table structure recognition approach achieves state-of-the-art performance on three public benchmarks, including SciTSR, PubTabNet and cTDaR TrackB2-Modern. Moreover, we have further demonstrated the advantages of our approach in recognizing tables with complex structures, large blank spaces, as well as geometrically distorted or even curved shapes on a more challenging in-house dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes