CVAILGOct 31, 2023

Diversity and Diffusion: Observations on Synthetic Image Distributions with Stable Diffusion

arXiv:2311.00056v18 citationsh-index: 45
Originality Synthesis-oriented
AI Analysis

This addresses the problem for researchers and practitioners seeking to use synthetic images for training machine learning classifiers, highlighting incremental insights into the limitations of current text-to-image systems.

The paper investigates why classifiers trained solely on synthetic images from text-to-image systems like Stable Diffusion perform poorly, despite the images appearing realistic, by analyzing semantic mismatches and identifying four key limitations: ambiguity, adherence to prompt, lack of diversity, and inability to represent underlying concepts.

Recent progress in text-to-image (TTI) systems, such as StableDiffusion, Imagen, and DALL-E 2, have made it possible to create realistic images with simple text prompts. It is tempting to use these systems to eliminate the manual task of obtaining natural images for training a new machine learning classifier. However, in all of the experiments performed to date, classifiers trained solely with synthetic images perform poorly at inference, despite the images used for training appearing realistic. Examining this apparent incongruity in detail gives insight into the limitations of the underlying image generation processes. Through the lens of diversity in image creation vs.accuracy of what is created, we dissect the differences in semantic mismatches in what is modeled in synthetic vs. natural images. This will elucidate the roles of the image-languag emodel, CLIP, and the image generation model, diffusion. We find four issues that limit the usefulness of TTI systems for this task: ambiguity, adherence to prompt, lack of diversity, and inability to represent the underlying concept. We further present surprising insights into the geometry of CLIP embeddings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes