A Strong and Reproducible Object Detector with Only Public Datasets
This provides a strong, reproducible object detection solution for researchers and practitioners, though it is incremental as it combines existing components.
The paper tackles the problem of creating a reproducible object detector without private data by combining FocalNet-Huge with Stable-DINO, achieving 64.6-64.8 AP on COCO with only 700M parameters.
This work presents Focal-Stable-DINO, a strong and reproducible object detection model which achieves 64.6 AP on COCO val2017 and 64.8 AP on COCO test-dev using only 700M parameters without any test time augmentation. It explores the combination of the powerful FocalNet-Huge backbone with the effective Stable-DINO detector. Different from existing SOTA models that utilize an extensive number of parameters and complex training techniques on large-scale private data or merged data, our model is exclusively trained on the publicly available dataset Objects365, which ensures the reproducibility of our approach.