CVNov 28, 2024

Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model

arXiv:2411.19108v2190 citationsh-index: 17CVPR
Originality Incremental advance
AI Analysis

This work addresses inference efficiency for video generation, offering a practical improvement over existing caching methods, though it is incremental as it builds on prior caching strategies.

The paper tackles the low inference speed of video diffusion models by proposing TeaCache, a training-free caching approach that leverages timestep embeddings to estimate differences in model outputs, achieving up to 4.41x acceleration with minimal visual quality degradation (-0.07% Vbench score).

As a fundamental backbone for video generation, diffusion models are challenged by low inference speed due to the sequential nature of denoising. Previous methods speed up the models by caching and reusing model outputs at uniformly selected timesteps. However, such a strategy neglects the fact that differences among model outputs are not uniform across timesteps, which hinders selecting the appropriate model outputs to cache, leading to a poor balance between inference efficiency and visual quality. In this study, we introduce Timestep Embedding Aware Cache (TeaCache), a training-free caching approach that estimates and leverages the fluctuating differences among model outputs across timesteps. Rather than directly using the time-consuming model outputs, TeaCache focuses on model inputs, which have a strong correlation with the modeloutputs while incurring negligible computational cost. TeaCache first modulates the noisy inputs using the timestep embeddings to ensure their differences better approximating those of model outputs. TeaCache then introduces a rescaling strategy to refine the estimated differences and utilizes them to indicate output caching. Experiments show that TeaCache achieves up to 4.41x acceleration over Open-Sora-Plan with negligible (-0.07% Vbench score) degradation of visual quality.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes