Stochastic Transition-Map Distillation for Fast Probabilistic Inference
For practitioners of diffusion models, STMD offers a fast, teacher-free distillation method that maintains probabilistic inference, enabling efficient deployment in downstream tasks like inverse problems.
Stochastic Transition-Map Distillation (STMD) accelerates diffusion model inference to one or few steps while preserving stochastic sampling, achieving competitive generation quality on MNIST, CIFAR-10, and CelebA without requiring a pretrained teacher or trajectory caching.
Diffusion models achieve strong generation quality, diversity, and distribution coverage, but their performance often comes with expensive inference. In this work, we propose Stochastic Transition-Map Distillation (STMD), a teacher-free framework for accelerating diffusion model inference while preserving probabilistic sample generation. In contrast to score-based diffusion models, whose denoising parametrization models the mean of the posterior distribution, STMD distills the full transition map associated with the sampling stochastic differential equation (SDE). We parameterize these SDE transitions with a conditional Mean Flow model, yielding a one- or few-step stochastic sampler that retains the transition structure of the underlying diffusion process. This perspective is especially useful for downstream tasks that require stochastic inference, such as diffusion posterior sampling, inverse problems, and energy-based fine-tuning. Compared to recent distillation methods, STMD requires no pretrained teacher, bi-level optimization, or trajectory simulation and caching, enabling efficient and scalable training. We derive convergence bounds for our method in the Wasserstein distance, providing a strong theoretical foundation for our approach, and validate STMD on various image generation examples on the MNIST, CIFAR-10, and CelebA datasets.