CVLGMar 12, 2025

DitHub: A Modular Framework for Incremental Open-Vocabulary Object Detection

arXiv:2503.09271v42 citationsh-index: 66
Originality Incremental advance
AI Analysis

This work addresses the problem of flexible and efficient adaptation in object detection for AI researchers, though it is incremental as it builds on existing modular deep learning concepts.

The paper tackles the challenge of adapting open-vocabulary object detectors to rare classes and specialized domains by introducing DitHub, a modular framework that uses expert modules managed like version control branches, achieving state-of-the-art performance on benchmarks such as ODinW-13 and ODinW-O.

Open-Vocabulary object detectors can generalize to an unrestricted set of categories through simple textual prompting. However, adapting these models to rare classes or reinforcing their abilities on multiple specialized domains remains essential. While recent methods rely on monolithic adaptation strategies with a single set of weights, we embrace modular deep learning. We introduce DitHub, a framework designed to build and maintain a library of efficient adaptation modules. Inspired by Version Control Systems, DitHub manages expert modules as branches that can be fetched and merged as needed. This modular approach allows us to conduct an in-depth exploration of the compositional properties of adaptation modules, marking the first such study in Object Detection. Our method achieves state-of-the-art performance on the ODinW-13 benchmark and ODinW-O, a newly introduced benchmark designed to assess class reappearance. For more details, visit our project page: https://aimagelab.github.io/DitHub/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes