Plain-Det: A Plain Multi-Dataset Object Detector
This work addresses the problem of data scarcity in dense computer vision tasks like object detection for researchers and practitioners, though it is incremental as it builds on existing methods like Def-DETR.
The paper tackles the challenge of training object detectors by combining multiple datasets to overcome annotation difficulties, achieving a mAP of 51.9 on COCO that matches state-of-the-art detectors and demonstrating strong generalization across 13 downstream datasets.
Recent advancements in large-scale foundational models have sparked widespread interest in training highly proficient large vision models. A common consensus revolves around the necessity of aggregating extensive, high-quality annotated data. However, given the inherent challenges in annotating dense tasks in computer vision, such as object detection and segmentation, a practical strategy is to combine and leverage all available data for training purposes. In this work, we propose Plain-Det, which offers flexibility to accommodate new datasets, robustness in performance across diverse datasets, training efficiency, and compatibility with various detection architectures. We utilize Def-DETR, with the assistance of Plain-Det, to achieve a mAP of 51.9 on COCO, matching the current state-of-the-art detectors. We conduct extensive experiments on 13 downstream datasets and Plain-Det demonstrates strong generalization capability. Code is release at https://github.com/ChengShiest/Plain-Det