Mean-Shift Distillation for Diffusion Mode Seeking
This addresses mode-seeking issues in diffusion models for generative AI applications, representing a novel method for a known bottleneck rather than a fundamental paradigm shift.
The paper tackles the problem of mode misalignment in diffusion models by introducing mean-shift distillation, a technique that provides a provably good proxy for the gradient of diffusion output distributions. The method shows superior mode alignment and improved convergence in synthetic and practical setups, yielding higher-fidelity results in text-to-image and text-to-3D applications with Stable Diffusion.
We present mean-shift distillation, a novel diffusion distillation technique that provides a provably good proxy for the gradient of the diffusion output distribution. This is derived directly from mean-shift mode seeking on the distribution, and we show that its extrema are aligned with the modes. We further derive an efficient product distribution sampling procedure to evaluate the gradient. Our method is formulated as a drop-in replacement for score distillation sampling (SDS), requiring neither model retraining nor extensive modification of the sampling procedure. We show that it exhibits superior mode alignment as well as improved convergence in both synthetic and practical setups, yielding higher-fidelity results when applied to both text-to-image and text-to-3D applications with Stable Diffusion.