CVMar 7, 2025

Accelerating Diffusion Transformer via Gradient-Optimized Cache

Junxiang Qiu, Lin Liu, Shuo Wang, Jinda Lu, Kezhou Chen, Yanbin Hao

arXiv:2503.05156v215 citationsh-index: 7Has Code

Originality Incremental advance

AI Analysis

This work addresses efficiency-quality trade-offs in diffusion models for image generation, representing an incremental improvement over existing caching techniques.

The paper tackles the problem of error accumulation in feature caching for diffusion transformers, which degrades generation quality when over 50% of blocks are cached. The proposed Gradient-Optimized Cache (GOC) method improves performance, achieving a 26.3% higher IS score and 43% lower FID compared to baseline DiT with 50% cached blocks.

Feature caching has emerged as an effective strategy to accelerate diffusion transformer (DiT) sampling through temporal feature reuse. It is a challenging problem since (1) Progressive error accumulation from cached blocks significantly degrades generation quality, particularly when over 50\% of blocks are cached; (2) Current error compensation approaches neglect dynamic perturbation patterns during the caching process, leading to suboptimal error correction. To solve these problems, we propose the Gradient-Optimized Cache (GOC) with two key innovations: (1) Cached Gradient Propagation: A gradient queue dynamically computes the gradient differences between cached and recomputed features. These gradients are weighted and propagated to subsequent steps, directly compensating for the approximation errors introduced by caching. (2) Inflection-Aware Optimization: Through statistical analysis of feature variation patterns, we identify critical inflection points where the denoising trajectory changes direction. By aligning gradient updates with these detected phases, we prevent conflicting gradient directions during error correction. Extensive evaluations on ImageNet demonstrate GOC's superior trade-off between efficiency and quality. With 50\% cached blocks, GOC achieves IS 216.28 (26.3\% higher) and FID 3.907 (43\% lower) compared to baseline DiT, while maintaining identical computational costs. These improvements persist across various cache ratios, demonstrating robust adaptability to different acceleration requirements. Code is available at https://github.com/qiujx0520/GOC_ICCV2025.git.

View on arXiv PDF Code

Similar