CVOct 5, 2025

Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers

arXiv:2510.04188v14 citationsh-index: 6
Originality Highly original
AI Analysis

This work addresses the slow sampling problem in diffusion-based image and video synthesis for AI practitioners, offering a training-free acceleration method that is incremental but provides strong performance gains.

The paper tackles the bottleneck of high computational cost in Diffusion Transformers' iterative sampling by introducing HyCa, a hybrid feature caching framework that applies dimension-wise caching strategies, achieving near-lossless acceleration with speedups of up to 6.24 times on various models without retraining.

Diffusion Transformers offer state-of-the-art fidelity in image and video synthesis, but their iterative sampling process remains a major bottleneck due to the high cost of transformer forward passes at each timestep. To mitigate this, feature caching has emerged as a training-free acceleration technique that reuses or forecasts hidden representations. However, existing methods often apply a uniform caching strategy across all feature dimensions, ignoring their heterogeneous dynamic behaviors. Therefore, we adopt a new perspective by modeling hidden feature evolution as a mixture of ODEs across dimensions, and introduce HyCa, a Hybrid ODE solver inspired caching framework that applies dimension-wise caching strategies. HyCa achieves near-lossless acceleration across diverse domains and models, including 5.55 times speedup on FLUX, 5.56 times speedup on HunyuanVideo, 6.24 times speedup on Qwen-Image and Qwen-Image-Edit without retraining.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes