CVMar 24, 2025

Training-free Diffusion Acceleration with Bottleneck Sampling

arXiv:2503.18940v220 citationsh-index: 15
Originality Incremental advance
AI Analysis

This addresses the deployment challenge of diffusion models for visual content generation by reducing computational overhead without retraining, though it is an incremental improvement on existing acceleration methods.

The paper tackles the high computational cost of diffusion models during inference by introducing Bottleneck Sampling, a training-free framework that leverages low-resolution priors to accelerate inference by up to 3× for image generation and 2.5× for video generation while maintaining comparable output quality.

Diffusion models have demonstrated remarkable capabilities in visual content generation but remain challenging to deploy due to their high computational cost during inference. This computational burden primarily arises from the quadratic complexity of self-attention with respect to image or video resolution. While existing acceleration methods often compromise output quality or necessitate costly retraining, we observe that most diffusion models are pre-trained at lower resolutions, presenting an opportunity to exploit these low-resolution priors for more efficient inference without degrading performance. In this work, we introduce Bottleneck Sampling, a training-free framework that leverages low-resolution priors to reduce computational overhead while preserving output fidelity. Bottleneck Sampling follows a high-low-high denoising workflow: it performs high-resolution denoising in the initial and final stages while operating at lower resolutions in intermediate steps. To mitigate aliasing and blurring artifacts, we further refine the resolution transition points and adaptively shift the denoising timesteps at each stage. We evaluate Bottleneck Sampling on both image and video generation tasks, where extensive experiments demonstrate that it accelerates inference by up to 3$\times$ for image generation and 2.5$\times$ for video generation, all while maintaining output quality comparable to the standard full-resolution sampling process across multiple evaluation metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes