CVJul 17, 2024

CerberusDet: Unified Multi-Dataset Object Detection

arXiv:2407.12632v25 citationsh-index: 4Has Code
Originality Incremental advance
AI Analysis

This addresses the challenge of efficiently handling multiple object detection tasks without compromising performance, which is incremental in improving multi-dataset training efficiency.

The paper tackles the problem of object detection models being limited by fixed categories and dataset incompatibilities, introducing CerberusDet, a multi-headed framework that achieves state-of-the-art results with 36% less inference time on datasets like PASCAL VOC and Objects365.

Conventional object detection models are usually limited by the data on which they were trained and by the category logic they define. With the recent rise of Language-Visual Models, new methods have emerged that are not restricted to these fixed categories. Despite their flexibility, such Open Vocabulary detection models still fall short in accuracy compared to traditional models with fixed classes. At the same time, more accurate data-specific models face challenges when there is a need to extend classes or merge different datasets for training. The latter often cannot be combined due to different logics or conflicting class definitions, making it difficult to improve a model without compromising its performance. In this paper, we introduce CerberusDet, a framework with a multi-headed model designed for handling multiple object detection tasks. Proposed model is built on the YOLO architecture and efficiently shares visual features from both backbone and neck components, while maintaining separate task heads. This approach allows CerberusDet to perform very efficiently while still delivering optimal results. We evaluated the model on the PASCAL VOC dataset and Objects365 dataset to demonstrate its abilities. CerberusDet achieved state-of-the-art results with 36% less inference time. The more tasks are trained together, the more efficient the proposed model becomes compared to running individual models sequentially. The training and inference code, as well as the model, are available as open-source (https://github.com/ai-forever/CerberusDet).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes