Diffusion Models are Secretly Exchangeable: Parallelizing DDPMs via Autospeculation
This work addresses the computational inefficiency of diffusion models for users in generative AI, offering a novel method to speed up inference, though it is incremental as it adapts existing techniques to a new setting.
The paper tackles the inference-time bottleneck in Denoising Diffusion Probabilistic Models (DDPMs) by proving an exchangeability property, enabling near-black-box adaptation of optimization techniques from autoregressive models, and introduces Autospeculative Decoding (ASD) to achieve a $ ilde{O}(K^{rac{1}{3}})$ parallel runtime speedup over sequential DDPMs, with practical implementations showing significant acceleration in various domains.
Denoising Diffusion Probabilistic Models (DDPMs) have emerged as powerful tools for generative modeling. However, their sequential computation requirements lead to significant inference-time bottlenecks. In this work, we utilize the connection between DDPMs and Stochastic Localization to prove that, under an appropriate reparametrization, the increments of DDPM satisfy an exchangeability property. This general insight enables near-black-box adaptation of various performance optimization techniques from autoregressive models to the diffusion setting. To demonstrate this, we introduce \emph{Autospeculative Decoding} (ASD), an extension of the widely used speculative decoding algorithm to DDPMs that does not require any auxiliary draft models. Our theoretical analysis shows that ASD achieves a $\tilde{O} (K^{\frac{1}{3}})$ parallel runtime speedup over the $K$ step sequential DDPM. We also demonstrate that a practical implementation of autospeculative decoding accelerates DDPM inference significantly in various domains.