CVLGDec 31, 2025

It's Never Too Late: Noise Optimization for Collapse Recovery in Trained Diffusion Models

Berkeley
arXiv:2601.00090v13 citationsh-index: 111
Originality Incremental advance
AI Analysis

This addresses mode collapse in text-to-image generation, which is an incremental improvement over existing methods like guidance or candidate refinement.

The paper tackles mode collapse in text-to-image diffusion models by proposing noise optimization to increase diversity in generated images while maintaining fidelity, showing superior results in quality and variety.

Contemporary text-to-image models exhibit a surprising degree of mode collapse, as can be seen when sampling several images given the same text prompt. While previous work has attempted to address this issue by steering the model using guidance mechanisms, or by generating a large pool of candidates and refining them, in this work we take a different direction and aim for diversity in generations via noise optimization. Specifically, we show that a simple noise optimization objective can mitigate mode collapse while preserving the fidelity of the base model. We also analyze the frequency characteristics of the noise and show that alternative noise initializations with different frequency profiles can improve both optimization and search. Our experiments demonstrate that noise optimization yields superior results in terms of generation quality and variety.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes