CVMar 6, 2025

Fractional Correspondence Framework in Detection Transformer

arXiv:2503.04107v11 citationsh-index: 24MM
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in object detection for computer vision researchers, offering an incremental improvement over existing DETR-based methods.

The paper tackles the problem of suboptimal object matching in Detection Transformers (DETR) due to strict one-to-one assignments, proposing a Regularized Transport Plan (RTP) that enables soft, fractional matching to better handle varying object densities and distributions, resulting in absolute mAP gains of +3.8% over Deform-DETR and +1.7% over DINO-DETR on benchmarks.

The Detection Transformer (DETR), by incorporating the Hungarian algorithm, has significantly simplified the matching process in object detection tasks. This algorithm facilitates optimal one-to-one matching of predicted bounding boxes to ground-truth annotations during training. While effective, this strict matching process does not inherently account for the varying densities and distributions of objects, leading to suboptimal correspondences such as failing to handle multiple detections of the same object or missing small objects. To address this, we propose the Regularized Transport Plan (RTP). RTP introduces a flexible matching strategy that captures the cost of aligning predictions with ground truths to find the most accurate correspondences between these sets. By utilizing the differentiable Sinkhorn algorithm, RTP allows for soft, fractional matching rather than strict one-to-one assignments. This approach enhances the model's capability to manage varying object densities and distributions effectively. Our extensive evaluations on the MS-COCO and VOC benchmarks demonstrate the effectiveness of our approach. RTP-DETR, surpassing the performance of the Deform-DETR and the recently introduced DINO-DETR, achieving absolute gains in mAP of +3.8% and +1.7%, respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes