Model Collapse in the Self-Consuming Chain of Diffusion Finetuning: A Novel Perspective from Quantitative Trait Modeling
This addresses the issue of model degradation for researchers and practitioners using diffusion models in image generation, though it is incremental as it builds on existing concepts of model collapse.
The paper tackles the problem of model collapse in diffusion models when iteratively finetuned on their own outputs, showing that severe image quality degradation occurs universally and identifying CFG scale as a key factor, and proposes ReDiFine, a plug-and-play strategy that operates robustly without hyperparameter tuning.
Model collapse, the severe degradation of generative models when iteratively trained on their own outputs, has gained significant attention in recent years. This paper examines Chain of Diffusion, where a pretrained text-to-image diffusion model is finetuned on its own generated images. We demonstrate that severe image quality degradation was universal and identify CFG scale as the key factor impacting this model collapse. Drawing on an analogy between the Chain of Diffusion and biological evolution, we then introduce a novel theoretical analysis based on quantitative trait modeling from statistical genetics. Our theoretical analysis aligns with empirical observations of the generated images in the Chain of Diffusion. Finally, we propose Reusable Diffusion Finetuning (ReDiFine), a simple yet effective strategy inspired by genetic mutations. It operates robustly across various scenarios without requiring any hyperparameter tuning, making it a plug-and-play solution for reusable image generation.