CVMay 23, 2024

Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models

arXiv:2405.14828v247 citationsh-index: 8WACV
Originality Incremental advance
AI Analysis

This work addresses the problem of optimizing image quality and control for users of text-to-image diffusion models, though it is incremental as it builds on existing seed-based generation methods.

The study investigated the impact of random seeds on image generation in text-to-image diffusion models, revealing that 'golden' seeds achieve an FID of 21.60 compared to 31.97 for inferior seeds, and seeds can be predicted with over 99.9% accuracy, influencing visual attributes like grayscale images and object composition.

Recent advances in text-to-image (T2I) diffusion models have facilitated creative and photorealistic image synthesis. By varying the random seeds, we can generate many images for a fixed text prompt. Technically, the seed controls the initial noise and, in multi-step diffusion inference, the noise used for reparameterization at intermediate timesteps in the reverse diffusion process. However, the specific impact of the random seed on the generated images remains relatively unexplored. In this work, we conduct a large-scale scientific study into the impact of random seeds during diffusion inference. Remarkably, we reveal that the best 'golden' seed achieved an impressive FID of 21.60, compared to the worst 'inferior' seed's FID of 31.97. Additionally, a classifier can predict the seed number used to generate an image with over 99.9% accuracy in just a few epochs, establishing that seeds are highly distinguishable based on generated images. Encouraged by these findings, we examined the influence of seeds on interpretable visual dimensions. We find that certain seeds consistently produce grayscale images, prominent sky regions, or image borders. Seeds also affect image composition, including object location, size, and depth. Moreover, by leveraging these 'golden' seeds, we demonstrate improved image generation such as high-fidelity inference and diversified sampling. Our investigation extends to inpainting tasks, where we uncover some seeds that tend to insert unwanted text artifacts. Overall, our extensive analyses highlight the importance of selecting good seeds and offer practical utility for image generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes