CVDec 19, 2024

Dataset Augmentation by Mixing Visual Concepts

arXiv:2412.15358v1h-index: 11WACV
Originality Incremental advance
AI Analysis

This addresses dataset augmentation for computer vision tasks, but it is incremental as it builds on existing diffusion models and augmentation methods.

The paper tackles the problem of domain discrepancy in diffusion-generated images for dataset augmentation by fine-tuning pre-trained models with a novel Mixing Visual Concepts (MVC) procedure, resulting in outperforming state-of-the-art augmentation techniques on benchmark classification tasks.

This paper proposes a dataset augmentation method by fine-tuning pre-trained diffusion models. Generating images using a pre-trained diffusion model with textual conditioning often results in domain discrepancy between real data and generated images. We propose a fine-tuning approach where we adapt the diffusion model by conditioning it with real images and novel text embeddings. We introduce a unique procedure called Mixing Visual Concepts (MVC) where we create novel text embeddings from image captions. The MVC enables us to generate multiple images which are diverse and yet similar to the real data enabling us to perform effective dataset augmentation. We perform comprehensive qualitative and quantitative evaluations with the proposed dataset augmentation approach showcasing both coarse-grained and finegrained changes in generated images. Our approach outperforms state-of-the-art augmentation techniques on benchmark classification tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes