CVFeb 8, 2024

InstaGen: Enhancing Object Detection by Training on Synthetic Dataset

arXiv:2402.05937v343 citationsh-index: 19CVPR
AI Analysis

This addresses the challenge of improving object detection for applications like robotics or surveillance by providing a scalable data synthesis method, though it is incremental as it builds on existing diffusion models.

The paper tackles the problem of enhancing object detectors by training them on synthetic data generated from diffusion models, achieving superior performance with gains of +4.5 AP in open-vocabulary and +1.2 to 5.2 AP in data-sparse scenarios.

In this paper, we present a novel paradigm to enhance the ability of object detector, e.g., expanding categories or improving detection performance, by training on synthetic dataset generated from diffusion models. Specifically, we integrate an instance-level grounding head into a pre-trained, generative diffusion model, to augment it with the ability of localising instances in the generated images. The grounding head is trained to align the text embedding of category names with the regional visual feature of the diffusion model, using supervision from an off-the-shelf object detector, and a novel self-training scheme on (novel) categories not covered by the detector. We conduct thorough experiments to show that, this enhanced version of diffusion model, termed as InstaGen, can serve as a data synthesizer, to enhance object detectors by training on its generated samples, demonstrating superior performance over existing state-of-the-art methods in open-vocabulary (+4.5 AP) and data-sparse (+1.2 to 5.2 AP) scenarios. Project page with code: https://fcjian.github.io/InstaGen.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes