LGCVFeb 12, 2024

ClusterTabNet: Supervised clustering method for table detection and table structure recognition

arXiv:2402.07502v23 citationsh-index: 2ICDAR
AI Analysis

This addresses the problem of efficiently extracting table data from documents for applications like data analysis, though it appears incremental as it builds on existing deep learning approaches.

The paper tackles table detection and structure recognition by clustering words using a transformer encoder to predict adjacency matrices, achieving similar or better accuracy than state-of-the-art methods like DETR and Faster R-CNN with a significantly smaller model on datasets such as PubTables-1M, PubTabNet, and FinTabNet.

We present a novel deep-learning-based method to cluster words in documents which we apply to detect and recognize tables given the OCR output. We interpret table structure bottom-up as a graph of relations between pairs of words (belonging to the same row, column, header, as well as to the same table) and use a transformer encoder model to predict its adjacency matrix. We demonstrate the performance of our method on the PubTables-1M dataset as well as PubTabNet and FinTabNet datasets. Compared to the current state-of-the-art detection methods such as DETR and Faster R-CNN, our method achieves similar or better accuracy, while requiring a significantly smaller model.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes