CVAug 12, 2025

TaoCache: Structure-Maintained Video Generation Acceleration

arXiv:2508.08978v15 citationsh-index: 3
Originality Incremental advance
AI Analysis

This addresses the problem of maintaining structure and consistency in accelerated video generation for users of diffusion models, representing an incremental improvement over existing caching techniques.

The paper tackles structural discrepancies in cache-based acceleration for video diffusion models by introducing TaoCache, a training-free caching strategy that predicts noise output from a fixed-point perspective, achieving higher visual quality metrics (LPIPS, SSIM, PSNR) than prior methods under the same speedups.

Existing cache-based acceleration methods for video diffusion models primarily skip early or mid denoising steps, which often leads to structural discrepancies relative to full-timestep generation and can hinder instruction following and character consistency. We present TaoCache, a training-free, plug-and-play caching strategy that, instead of residual-based caching, adopts a fixed-point perspective to predict the model's noise output and is specifically effective in late denoising stages. By calibrating cosine similarities and norm ratios of consecutive noise deltas, TaoCache preserves high-resolution structure while enabling aggressive skipping. The approach is orthogonal to complementary accelerations such as Pyramid Attention Broadcast (PAB) and TeaCache, and it integrates seamlessly into DiT-based frameworks. Across Latte-1, OpenSora-Plan v110, and Wan2.1, TaoCache attains substantially higher visual quality (LPIPS, SSIM, PSNR) than prior caching methods under the same speedups.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes