Plug-and-Play Diffusion Distillation
This addresses the computational bottleneck in diffusion models for image generation, offering a practical improvement for users needing faster inference, though it is incremental as it builds on existing distillation and guidance techniques.
The paper tackles the slow inference times of diffusion models by proposing a plug-and-play distillation method that trains an external lightweight guide model while keeping the original model frozen, reducing inference computation by almost half and requiring only 1% trainable parameters while maintaining visual fidelity with as few as 8 to 16 steps.
Diffusion models have shown tremendous results in image generation. However, due to the iterative nature of the diffusion process and its reliance on classifier-free guidance, inference times are slow. In this paper, we propose a new distillation approach for guided diffusion models in which an external lightweight guide model is trained while the original text-to-image model remains frozen. We show that our method reduces the inference computation of classifier-free guided latent-space diffusion models by almost half, and only requires 1\% trainable parameters of the base model. Furthermore, once trained, our guide model can be applied to various fine-tuned, domain-specific versions of the base diffusion model without the need for additional training: this "plug-and-play" functionality drastically improves inference computation while maintaining the visual fidelity of generated images. Empirically, we show that our approach is able to produce visually appealing results and achieve a comparable FID score to the teacher with as few as 8 to 16 steps.