CVJan 8, 2024

MS-DETR: Efficient DETR Training with Mixed Supervision

Chuyang Zhao, Yifan Sun, Wenhao Wang, Qiang Chen, Errui Ding, Yi Yang, Jingdong Wang

arXiv:2401.03989v121.572 citationsh-index: 60Has CodeCVPR

Originality Incremental advance

AI Analysis

This work addresses training efficiency for object detection models, specifically for researchers and practitioners using DETR-based architectures, and is incremental as it builds on prior DETR variants.

The paper tackles the inefficiency in DETR training by introducing mixed supervision (one-to-one and one-to-many) to directly supervise object candidate generation, resulting in improved performance over existing DETR variants like DN-DETR, Hybrid DETR, and Group DETR.

DETR accomplishes end-to-end object detection through iteratively generating multiple object candidates based on image features and promoting one candidate for each ground-truth object. The traditional training procedure using one-to-one supervision in the original DETR lacks direct supervision for the object detection candidates. We aim at improving the DETR training efficiency by explicitly supervising the candidate generation procedure through mixing one-to-one supervision and one-to-many supervision. Our approach, namely MS-DETR, is simple, and places one-to-many supervision to the object queries of the primary decoder that is used for inference. In comparison to existing DETR variants with one-to-many supervision, such as Group DETR and Hybrid DETR, our approach does not need additional decoder branches or object queries. The object queries of the primary decoder in our approach directly benefit from one-to-many supervision and thus are superior in object candidate prediction. Experimental results show that our approach outperforms related DETR variants, such as DN-DETR, Hybrid DETR, and Group DETR, and the combination with related DETR variants further improves the performance.

View on arXiv PDF Code

Similar