CVMar 2, 2022

D^2ETR: Decoder-Only DETR with Computationally Efficient Cross-Scale Attention

arXiv:2203.00860v132 citationsh-index: 25
Originality Incremental advance
AI Analysis

This work addresses computational efficiency for object detection in computer vision, representing an incremental improvement over existing DETR-based methods.

The paper tackles the high computational cost and slow convergence of DETR by proposing D^2ETR, a decoder-only detector with a novel cross-scale attention module, which achieves lower complexity and higher accuracy on the COCO benchmark compared to DETR and its variants.

DETR is the first fully end-to-end detector that predicts a final set of predictions without post-processing. However, it suffers from problems such as low performance and slow convergence. A series of works aim to tackle these issues in different ways, but the computational cost is yet expensive due to the sophisticated encoder-decoder architecture. To alleviate this issue, we propose a decoder-only detector called D^2ETR. In the absence of encoder, the decoder directly attends to the fine-fused feature maps generated by the Transformer backbone with a novel computationally efficient cross-scale attention module. D^2ETR demonstrates low computational complexity and high detection accuracy in evaluations on the COCO benchmark, outperforming DETR and its variants.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes