Three Creates All: You Only Sample 3 Steps

Yuren Cai, Guangyi Wang, Zongqing Li, Li Li, Zhihui Liu, Songzhi Su

arXiv:2603.2237550.7h-index: 2

AI Analysis

This work addresses the inference bottleneck in diffusion models for faster generation, though it is incremental as it builds on existing methods with a plug-and-play optimization.

The paper tackles the slow inference speed of diffusion models by proposing Multi-layer Time Embedding Optimization (MTEO), which distills step-wise, layer-wise time embeddings to improve few-step sampling, achieving state-of-the-art performance and narrowing the gap between distillation-based and lightweight methods.

Diffusion models deliver high-fidelity generation but remain slow at inference time due to many sequential network evaluations. We find that standard timestep conditioning becomes a key bottleneck for few-step sampling. Motivated by layer-dependent denoising dynamics, we propose Multi-layer Time Embedding Optimization (MTEO), which freeze the pretrained diffusion backbone and distill a small set of step-wise, layer-wise time embeddings from reference trajectories. MTEO is plug-and-play with existing ODE solvers, adds no inference-time overhead, and trains only a tiny fraction of parameters. Extensive experiments across diverse datasets and backbones show state-of-the-art performance in the few-step sampling and substantially narrow the gap between distillation-based and lightweight methods. Code will be available.

View on arXiv PDF

Similar