ClusterTabNet: Supervised clustering method for table detection and table structure recognition
This addresses the problem of efficiently extracting table data from documents for applications like data analysis, though it appears incremental as it builds on existing deep learning approaches.
The paper tackles table detection and structure recognition by clustering words using a transformer encoder to predict adjacency matrices, achieving similar or better accuracy than state-of-the-art methods like DETR and Faster R-CNN with a significantly smaller model on datasets such as PubTables-1M, PubTabNet, and FinTabNet.
We present a novel deep-learning-based method to cluster words in documents which we apply to detect and recognize tables given the OCR output. We interpret table structure bottom-up as a graph of relations between pairs of words (belonging to the same row, column, header, as well as to the same table) and use a transformer encoder model to predict its adjacency matrix. We demonstrate the performance of our method on the PubTables-1M dataset as well as PubTabNet and FinTabNet datasets. Compared to the current state-of-the-art detection methods such as DETR and Faster R-CNN, our method achieves similar or better accuracy, while requiring a significantly smaller model.