Towards End-to-End Semi-Supervised Table Detection with Deformable Transformer
This work addresses the need for labeled data in table detection, offering a semi-supervised solution that could reduce annotation costs for document analysis tasks, though it appears incremental in improving existing methods.
The paper tackles the problem of table detection in document images by proposing a novel end-to-end semi-supervised method using a deformable transformer, which outperforms fully supervised and previous semi-supervised approaches by +3.4 and +1.8 points respectively on specific datasets with 10% labels.
Table detection is the task of classifying and localizing table objects within document images. With the recent development in deep learning methods, we observe remarkable success in table detection. However, a significant amount of labeled data is required to train these models effectively. Many semi-supervised approaches are introduced to mitigate the need for a substantial amount of label data. These approaches use CNN-based detectors that rely on anchor proposals and post-processing stages such as NMS. To tackle these limitations, this paper presents a novel end-to-end semi-supervised table detection method that employs the deformable transformer for detecting table objects. We evaluate our semi-supervised method on PubLayNet, DocBank, ICADR-19 and TableBank datasets, and it achieves superior performance compared to previous methods. It outperforms the fully supervised method (Deformable transformer) by +3.4 points on 10\% labels of TableBank-both dataset and the previous CNN-based semi-supervised approach (Soft Teacher) by +1.8 points on 10\% labels of PubLayNet dataset. We hope this work opens new possibilities towards semi-supervised and unsupervised table detection methods.