Broken Memories: Detecting and Mitigating Memorization in Diffusion Models with Degraded Generations
This work addresses privacy and copyright risks in diffusion models by providing a practical, on-the-fly method to detect and suppress memorization without degrading image quality.
The paper identifies that memorization in diffusion models causes internal numerical instability, leading to 'broken' artifacts. It proposes a detection and mitigation framework achieving AUC >0.999 detection and 0.0% memorization rate with negligible overhead.
While diffusion models excel at generating high-quality images, their tendency to memorize training data poses significant privacy and copyright risks. In this work, we for the first time identify that memorization induces internal numerical instability, often manifesting as visually ``broken'' artifacts. Inspired by stability analysis in numerical methods, we introduce empirical stability regions based on latent update norms to quantitatively characterize stable behavior during generation. Leveraging this, we propose a principled, on-the-fly framework for step-wise detection and adaptive mitigation. Our approach suppresses memorization without altering prompts or guidance, thereby preserving semantic fidelity and image quality. Extensive experiments on Stable Diffusion 1.4 demonstrate that our method achieves an AUC $>0.999$ detection performance and a $0.0\%$ memorization rate after mitigation with negligible overhead ($\approx0.01$s per image).