CVMar 6

Cross-Resolution Distribution Matching for Diffusion Distillation

arXiv:2603.06136v11 citations
Predicted impact top 10% in CV · last 90 daysOriginality Highly original
AI Analysis

This work addresses a bottleneck in accelerating image and video generation for AI applications, offering a novel method to improve efficiency without sacrificing quality.

The paper tackles the problem of quality degradation in diffusion distillation when using partial timestep low-resolution generation, proposing Cross-Resolution Distribution Matching Distillation (RMD) to bridge cross-resolution distribution gaps, achieving up to 33.4X speedup on SDXL and 25.6X on Wan2.1-14B while preserving high visual fidelity.

Diffusion distillation is central to accelerating image and video generation, yet existing methods are fundamentally limited by the denoising process, where step reduction has largely saturated. Partial timestep low-resolution generation can further accelerate inference, but it suffers noticeable quality degradation due to cross-resolution distribution gaps. We propose Cross-Resolution Distribution Matching Distillation (RMD), a novel distillation framework that bridges cross-resolution distribution gaps for high-fidelity, few-step multi-resolution cascaded inference. Specifically, RMD divides the timestep intervals for each resolution using logarithmic signal-to-noise ratio (logSNR) curves, and introduces logSNR-based mapping to compensate for resolution-induced shifts. Distribution matching is conducted along resolution trajectories to reduce the gap between low-resolution generator distributions and the teacher's high-resolution distribution. In addition, a predicted-noise re-injection mechanism is incorporated during upsampling to stabilize training and improve synthesis quality. Quantitative and qualitative results show that RMD preserves high-fidelity generation while accelerating inference across various backbones. Notably, RMD achieves up to 33.4X speedup on SDXL and 25.6X on Wan2.1-14B, while preserving high visual fidelity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes