CVAILGDec 4, 2023

Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with Synthetic Images

arXiv:2312.02253v222 citationsh-index: 16Trans. Mach. Learn. Res.
AI Analysis

This work addresses the challenge of data scarcity for computer vision researchers by enabling scalable training with synthetic images, though it is incremental as it builds on existing generative and adaptation techniques.

The paper tackles the problem of scaling up visual recognition training by using synthetic images without fine-tuning generative models, and achieves improved performance with up to 6 times more synthetic data than the original ImageNet size.

Recent advances in generative deep learning have enabled the creation of high-quality synthetic images in text-to-image generation. Prior work shows that fine-tuning a pretrained diffusion model on ImageNet and generating synthetic training images from the finetuned model can enhance an ImageNet classifier's performance. However, performance degrades as synthetic images outnumber real ones. In this paper, we explore whether generative fine-tuning is essential for this improvement and whether it is possible to further scale up training using more synthetic data. We present a new framework leveraging off-the-shelf generative models to generate synthetic training images, addressing multiple challenges: class name ambiguity, lack of diversity in naive prompts, and domain shifts. Specifically, we leverage large language models (LLMs) and CLIP to resolve class name ambiguity. To diversify images, we propose contextualized diversification (CD) and stylized diversification (SD) methods, also prompted by LLMs. Finally, to mitigate domain shifts, we leverage domain adaptation techniques with auxiliary batch normalization for synthetic images. Our framework consistently enhances recognition model performance with more synthetic data, up to 6x of original ImageNet size showcasing the potential of synthetic data for improved recognition models and strong out-of-domain generalization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes