CVAug 5, 2021

Fast Convergence of DETR with Spatially Modulated Co-Attention

Peng Gao, Minghang Zheng, Xiaogang Wang, Jifeng Dai, Hongsheng Li

arXiv:2108.02404v115.522 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the convergence issue in DETR for object detection, offering a significant speed-up with improved accuracy.

The paper tackles the slow convergence problem of DETR by proposing a Spatially Modulated Co-Attention (SMCA) mechanism, achieving 45.6 mAP at 108 epochs compared to DETR's 43.3 mAP at 500 epochs.

The recently proposed Detection Transformer (DETR) model successfully applies Transformer to objects detection and achieves comparable performance with two-stage object detection frameworks, such as Faster-RCNN. However, DETR suffers from its slow convergence. Training DETR from scratch needs 500 epochs to achieve a high accuracy. To accelerate its convergence, we propose a simple yet effective scheme for improving the DETR framework, namely Spatially Modulated Co-Attention (SMCA) mechanism. The core idea of SMCA is to conduct location-aware co-attention in DETR by constraining co-attention responses to be high near initially estimated bounding box locations. Our proposed SMCA increases DETR's convergence speed by replacing the original co-attention mechanism in the decoder while keeping other operations in DETR unchanged. Furthermore, by integrating multi-head and scale-selection attention designs into SMCA, our fully-fledged SMCA can achieve better performance compared to DETR with a dilated convolution-based backbone (45.6 mAP at 108 epochs vs. 43.3 mAP at 500 epochs). We perform extensive ablation studies on COCO dataset to validate SMCA. Code is released at https://github.com/gaopengcuhk/SMCA-DETR .

View on arXiv PDF Code

Similar