Generating on Generated: An Approach Towards Self-Evolving Diffusion Models
It addresses a specific challenge in recursive self-improvement for diffusion models, which is incremental as it builds on existing methods to enhance stability.
This paper tackles the problem of training collapse in text-to-image diffusion models caused by synthetic data, proposing strategies to mitigate perceptual misalignment and generative hallucinations, with experiments validating their effectiveness.
Recursive Self-Improvement (RSI) enables intelligence systems to autonomously refine their capabilities. This paper explores the application of RSI in text-to-image diffusion models, addressing the challenge of training collapse caused by synthetic data. We identify two key factors contributing to this collapse: the lack of perceptual alignment and the accumulation of generative hallucinations. To mitigate these issues, we propose three strategies: (1) a prompt construction and filtering pipeline designed to facilitate the generation of perceptual aligned data, (2) a preference sampling method to identify human-preferred samples and filter out generative hallucinations, and (3) a distribution-based weighting scheme to penalize selected samples with hallucinatory errors. Our extensive experiments validate the effectiveness of these approaches.