CVMar 14, 2022

TSR-DSAW: Table Structure Recognition via Deep Spatial Association of Words

arXiv:2203.06873v19 citationsh-index: 31
Originality Incremental advance
AI Analysis

This work addresses the challenge of accurately recognizing table structures in documents for applications in data extraction and digitization, though it appears incremental as it builds upon existing text-detection and classification methods.

The paper tackles the problem of Table Structure Recognition (TSR) from camera-captured or scanned documents, which performs poorly on complex tables with nested rows/columns, multi-line texts, and missing data, by proposing TSR-DSAW, an end-to-end pipeline that uses a deep network to capture spatial associations between word pairs, and demonstrates improvement over previous methods like TableNet and DeepDeSRT on datasets such as PubTabNet and ICDAR 2013.

Existing methods for Table Structure Recognition (TSR) from camera-captured or scanned documents perform poorly on complex tables consisting of nested rows / columns, multi-line texts and missing cell data. This is because current data-driven methods work by simply training deep models on large volumes of data and fail to generalize when an unseen table structure is encountered. In this paper, we propose to train a deep network to capture the spatial associations between different word pairs present in the table image for unravelling the table structure. We present an end-to-end pipeline, named TSR-DSAW: TSR via Deep Spatial Association of Words, which outputs a digital representation of a table image in a structured format such as HTML. Given a table image as input, the proposed method begins with the detection of all the words present in the image using a text-detection network like CRAFT which is followed by the generation of word-pairs using dynamic programming. These word-pairs are highlighted in individual images and subsequently, fed into a DenseNet-121 classifier trained to capture spatial associations such as same-row, same-column, same-cell or none. Finally, we perform post-processing on the classifier output to generate the table structure in HTML format. We evaluate our TSR-DSAW pipeline on two public table-image datasets -- PubTabNet and ICDAR 2013, and demonstrate improvement over previous methods such as TableNet and DeepDeSRT.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes