Stabilizing Self-Consuming Diffusion Models with Latent Space Filtering
This addresses the issue of training instability in generative models due to synthetic data reuse, which is a growing problem as AI-generated content proliferates online, though it is an incremental improvement over prior methods.
The paper tackled the problem of model collapse in self-consuming diffusion models by analyzing latent space degradation and proposing Latent Space Filtering (LSF), which filters out less realistic synthetic data to mitigate collapse without increasing training cost or requiring human annotation, consistently outperforming existing baselines across multiple real-world datasets.
As synthetic data proliferates across the Internet, it is often reused to train successive generations of generative models. This creates a ``self-consuming loop" that can lead to training instability or \textit{model collapse}. Common strategies to address the issue -- such as accumulating historical training data or injecting fresh real data -- either increase computational cost or require expensive human annotation. In this paper, we empirically analyze the latent space dynamics of self-consuming diffusion models and observe that the low-dimensional structure of latent representations extracted from synthetic data degrade over generations. Based on this insight, we propose \textit{Latent Space Filtering} (LSF), a novel approach that mitigates model collapse by filtering out less realistic synthetic data from mixed datasets. Theoretically, we present a framework that connects latent space degradation to empirical observations. Experimentally, we show that LSF consistently outperforms existing baselines across multiple real-world datasets, effectively mitigating model collapse without increasing training cost or relying on human annotation.