CVJul 20, 2025

Distilling Parallel Gradients for Fast ODE Solvers of Diffusion Models

arXiv:2507.14797v112 citationsh-index: 6Has Code
Originality Highly original
AI Analysis

This addresses the slow sampling problem in diffusion models for image synthesis, offering a plugin solution to accelerate existing methods with minimal training overhead.

The paper tackles the high sampling latency of diffusion models by proposing a novel ODE solver that uses parallel gradient evaluations to reduce truncation errors while maintaining low latency, achieving FID scores of 4.47 on CIFAR-10 and 7.97 on FFHQ at 5 NFE.

Diffusion models (DMs) have achieved state-of-the-art generative performance but suffer from high sampling latency due to their sequential denoising nature. Existing solver-based acceleration methods often face image quality degradation under a low-latency budget. In this paper, we propose the Ensemble Parallel Direction solver (dubbed as \ours), a novel ODE solver that mitigates truncation errors by incorporating multiple parallel gradient evaluations in each ODE step. Importantly, since the additional gradient computations are independent, they can be fully parallelized, preserving low-latency sampling. Our method optimizes a small set of learnable parameters in a distillation fashion, ensuring minimal training overhead. In addition, our method can serve as a plugin to improve existing ODE samplers. Extensive experiments on various image synthesis benchmarks demonstrate the effectiveness of our \ours~in achieving high-quality and low-latency sampling. For example, at the same latency level of 5 NFE, EPD achieves an FID of 4.47 on CIFAR-10, 7.97 on FFHQ, 8.17 on ImageNet, and 8.26 on LSUN Bedroom, surpassing existing learning-based solvers by a significant margin. Codes are available in https://github.com/BeierZhu/EPD.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes