AdaCorrection: Adaptive Offset Cache Correction for Accurate Diffusion Transformers
This addresses inference efficiency for users of diffusion models, but is incremental as it builds on existing caching methods.
The paper tackles the problem of expensive inference in Diffusion Transformers (DiTs) by introducing AdaCorrection, an adaptive offset cache correction framework that maintains high generation fidelity while enabling efficient cache reuse. The approach achieves near-original FID with moderate acceleration on image and video diffusion benchmarks.
Diffusion Transformers (DiTs) achieve state-of-the-art performance in high-fidelity image and video generation but suffer from expensive inference due to their iterative denoising structure. While prior methods accelerate sampling by caching intermediate features, they rely on static reuse schedules or coarse-grained heuristics, which often lead to temporal drift and cache misalignment that significantly degrade generation quality. We introduce \textbf{AdaCorrection}, an adaptive offset cache correction framework that maintains high generation fidelity while enabling efficient cache reuse across Transformer layers during diffusion inference. At each timestep, AdaCorrection estimates cache validity with lightweight spatio-temporal signals and adaptively blends cached and fresh activations. This correction is computed on-the-fly without additional supervision or retraining. Our approach achieves strong generation quality with minimal computational overhead, maintaining near-original FID while providing moderate acceleration. Experiments on image and video diffusion benchmarks show that AdaCorrection consistently improves generation performance.