Elucidating Representation Degradation Problem in Diffusion Model Training
For researchers training diffusion models, this work addresses an overlooked optimization bottleneck that degrades representation quality, offering a plug-and-play solution to improve training efficiency and generation performance.
The paper identifies a training bottleneck in diffusion models called Representation Degradation, where increasing noise levels cause structural distortion and training instability. The proposed ERD framework dynamically reallocates optimization effort based on recoverability, accelerating convergence and improving generation quality across diffusion backbones.
Diffusion models have achieved remarkable success, yet their training remains inefficient due to a severe optimization bottleneck, which we term Representation Degradation. As noise levels increase, the outputs of the trained model exhibit progressive structural distortion, which can destabilize training and impair generation quality. Our analysis suggests that this instability is driven by mismatched target recoverability, which is associated with Neural Tangent Kernel (NTK) spectral weakening and effective low-rank behavior. To address this, we propose Elucidated Representation Diffusion (ERD), a plug-and-play framework that dynamically reallocates optimization effort according to effective recoverability. By stabilizing representation learning without external supervision, ERD accelerates convergence and achieves strong empirical performance across diffusion backbones.