CVJun 7, 2024

The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better

arXiv:2406.05184v520 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This work highlights a critical baseline for synthetic data training in computer vision, showing that current methods are incremental and do not surpass simple retrieval of real data.

The study compared finetuning image classification models on synthetic images generated by Stable Diffusion versus real images retrieved from the LAION-2B dataset, finding that real data universally matched or outperformed synthetic data, with synthetic images suffering from artifacts and inaccuracies.

Generative text-to-image models enable us to synthesize unlimited amounts of images in a controllable manner, spurring many recent efforts to train vision models with synthetic data. However, every synthetic image ultimately originates from the upstream data used to train the generator. Does the intermediate generator provide additional information over directly training on relevant parts of the upstream data? Grounding this question in the setting of image classification, we compare finetuning on task-relevant, targeted synthetic data generated by Stable Diffusion -- a generative model trained on the LAION-2B dataset -- against finetuning on targeted real images retrieved directly from LAION-2B. We show that while synthetic data can benefit some downstream tasks, it is universally matched or outperformed by real data from the simple retrieval baseline. Our analysis suggests that this underperformance is partially due to generator artifacts and inaccurate task-relevant visual details in the synthetic images. Overall, we argue that targeted retrieval is a critical baseline to consider when training with synthetic data -- a baseline that current methods do not yet surpass. We release code, data, and models at https://github.com/scottgeng00/unmet-promise.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes