LIFT and PLACE: A Simple, Stable, and Effective Knowledge Distillation Framework for Lightweight Diffusion Models
For practitioners deploying lightweight diffusion models, this provides a stable and effective distillation method that works across architectures and tasks, especially under extreme compression.
The paper addresses the challenge of knowledge distillation for diffusion models where the teacher's complex denoising process hinders student learning, proposing a coarse-to-fine framework (LIFT and PLACE) that decomposes training into coarse alignment and fine refinement with adaptive error-based guidance. The method achieves stable convergence and an FID of 15.73 even with a 1.3M-parameter student (1.6% of teacher), where conventional KD fails with FID 50-200+.
We demonstrate that in knowledge distillation for diffusion models, the teacher network's highly complex denoising process - stemming from its substantially larger capacity - poses a significant challenge for the student model to faithfully mimic. To address this problem, we propose a coarse-to-fine distillation framework with LInear FiTtingbased distillation (LIFT) and Piecewise Local Adaptive Coefficient Estimation (PLACE). First, LIFT decomposes the objective into a "coarse" alignment and a "fine" refinement. The student is then trained on coarse alignment before proceeding to hard refinement. Second, PLACE extends LIFT to address spatially non-uniform errors by partitioning outputs into error-based groups, providing locally adaptive guidance. Our experiments show that LIFT and PLACE is effective across diffusion spaces (image/latent), backbones (U-Net/DiT), tasks (unconditional/conditional), datasets, and even extends to flow-based models such as MMDiT (SD3). Furthermore, under extreme compression with a 1.3M-parameter student (only 1.6% of the teacher), conventional KD fails to provide sufficient guidance for stable training, with FID scores often degrading to 50-200+, but our method remains stably convergent and achieves an FID of 15.73.