LGCVFeb 11, 2025

Towards Training One-Step Diffusion Models Without Distillation

Cambridge
arXiv:2502.08005v37 citationsh-index: 8
AI Analysis

This work addresses the inefficiency of two-stage training pipelines in diffusion models for machine learning practitioners, though it is incremental as it still depends on teacher initialization.

The paper tackles the problem of training one-step diffusion models without relying on distillation from a teacher model, showing that new training methods can outperform teacher-guided approaches while still requiring teacher weight initialization for feature representation benefits.

Recent advances in training one-step diffusion models typically follow a two-stage pipeline: first training a teacher diffusion model and then distilling it into a one-step student model. This process often depends on both the teacher's score function for supervision and its weights for initializing the student model. In this paper, we explore whether one-step diffusion models can be trained directly without this distillation procedure. We introduce a family of new training methods that entirely forgo teacher score supervision, yet outperforms most teacher-guided distillation approaches. This suggests that score supervision is not essential for effective training of one-step diffusion models. However, we find that initializing the student model with the teacher's weights remains critical. Surprisingly, the key advantage of teacher initialization is not due to better latent-to-output mappings, but rather the rich set of feature representations across different noise levels that the teacher diffusion model provides. These insights take us one step closer towards training one-step diffusion models without distillation and provide a better understanding of the roles of teacher supervision and initialization in the distillation process.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes