CVMar 20, 2025

Scale-wise Distillation of Diffusion Models

arXiv:2503.16397v110 citationsh-index: 11
Originality Incremental advance
AI Analysis

This work addresses efficiency challenges in text-to-image diffusion models for AI practitioners, though it is incremental as it builds on existing distillation methods.

The paper tackles the computational cost of diffusion models by proposing a scale-wise distillation framework that initiates generation at lower resolutions and gradually upscales, achieving inference times close to two full-resolution steps while outperforming other methods under the same budget, as shown by metrics and human studies.

We present SwD, a scale-wise distillation framework for diffusion models (DMs), which effectively employs next-scale prediction ideas for diffusion-based few-step generators. In more detail, SwD is inspired by the recent insights relating diffusion processes to the implicit spectral autoregression. We suppose that DMs can initiate generation at lower data resolutions and gradually upscale the samples at each denoising step without loss in performance while significantly reducing computational costs. SwD naturally integrates this idea into existing diffusion distillation methods based on distribution matching. Also, we enrich the family of distribution matching approaches by introducing a novel patch loss enforcing finer-grained similarity to the target distribution. When applied to state-of-the-art text-to-image diffusion models, SwD approaches the inference times of two full resolution steps and significantly outperforms the counterparts under the same computation budget, as evidenced by automated metrics and human preference studies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes