CVNov 7, 2022

Group DETR v2: Strong Object Detector with Encoder-Decoder Pretraining

arXiv:2211.03594v155 citationsh-index: 60
Originality Synthesis-oriented
AI Analysis

This improves object detection accuracy for computer vision applications, but is incremental as it builds on existing methods like DETR and Group DETR.

The paper tackles object detection by combining encoder-decoder pretraining and finetuning with Group DETR v2, achieving 64.5 mAP on COCO test-dev and setting a new state-of-the-art.

We present a strong object detector with encoder-decoder pretraining and finetuning. Our method, called Group DETR v2, is built upon a vision transformer encoder ViT-Huge~\cite{dosovitskiy2020image}, a DETR variant DINO~\cite{zhang2022dino}, and an efficient DETR training method Group DETR~\cite{chen2022group}. The training process consists of self-supervised pretraining and finetuning a ViT-Huge encoder on ImageNet-1K, pretraining the detector on Object365, and finally finetuning it on COCO. Group DETR v2 achieves $\textbf{64.5}$ mAP on COCO test-dev, and establishes a new SoTA on the COCO leaderboard https://paperswithcode.com/sota/object-detection-on-coco

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes