CVMar 3, 2025

MI-DETR: An Object Detection Model with Multi-time Inquiries Mechanism

arXiv:2503.01463v116 citationsh-index: 4CVPR
Originality Incremental advance
AI Analysis

This work addresses object detection challenges in natural scenes, such as small or occluded objects, for computer vision applications, representing a novel method for a known bottleneck.

The paper tackles the limited information utilization in cascaded decoder architectures of DETR-like object detection models by proposing a new decoder with a parallel Multi-time Inquiries (MI) mechanism, which improves performance by +2.3 AP and +0.6 AP over DINO and Relation-DETR on the COCO benchmark with ResNet-50 backbone.

Based on analyzing the character of cascaded decoder architecture commonly adopted in existing DETR-like models, this paper proposes a new decoder architecture. The cascaded decoder architecture constrains object queries to update in the cascaded direction, only enabling object queries to learn relatively-limited information from image features. However, the challenges for object detection in natural scenes (e.g., extremely-small, heavily-occluded, and confusingly mixed with the background) require an object detection model to fully utilize image features, which motivates us to propose a new decoder architecture with the parallel Multi-time Inquiries (MI) mechanism. MI enables object queries to learn more comprehensive information, and our MI based model, MI-DETR, outperforms all existing DETR-like models on COCO benchmark under different backbones and training epochs, achieving +2.3 AP and +0.6 AP improvements compared to the most representative model DINO and SOTA model Relation-DETR under ResNet-50 backbone. In addition, a series of diagnostic and visualization experiments demonstrate the effectiveness, rationality, and interpretability of MI.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes