CVApr 30, 2024

Towards End-to-End Semi-Supervised Table Detection with Semantic Aligned Matching Transformer

arXiv:2405.00187v18 citationsh-index: 26ICDAR
Originality Incremental advance
AI Analysis

This work addresses the need for efficient and accurate table detection in semi-supervised settings, which is crucial for document processing applications.

The paper tackled the problem of table detection in document images by introducing a semi-supervised approach using SAM-DETR to align object queries with target features, resulting in reduced false positives and improved performance, especially for complex documents with diverse table structures.

Table detection within document images is a crucial task in document processing, involving the identification and localization of tables. Recent strides in deep learning have substantially improved the accuracy of this task, but it still heavily relies on large labeled datasets for effective training. Several semi-supervised approaches have emerged to overcome this challenge, often employing CNN-based detectors with anchor proposals and post-processing techniques like non-maximal suppression (NMS). However, recent advancements in the field have shifted the focus towards transformer-based techniques, eliminating the need for NMS and emphasizing object queries and attention mechanisms. Previous research has focused on two key areas to improve transformer-based detectors: refining the quality of object queries and optimizing attention mechanisms. However, increasing object queries can introduce redundancy, while adjustments to the attention mechanism can increase complexity. To address these challenges, we introduce a semi-supervised approach employing SAM-DETR, a novel approach for precise alignment between object queries and target features. Our approach demonstrates remarkable reductions in false positives and substantial enhancements in table detection performance, particularly in complex documents characterized by diverse table structures. This work provides more efficient and accurate table detection in semi-supervised settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes