CVJan 29

Past- and Future-Informed KV Cache Policy with Salience Estimation in Autoregressive Video Diffusion

arXiv:2601.21896v36 citationsh-index: 5
Originality Incremental advance
AI Analysis

This work addresses a domain-specific problem for video generation researchers and practitioners, offering an incremental improvement over existing heuristic cache policies.

The paper tackles the problem of inefficient KV cache policies in autoregressive video generation, which degrade quality and efficiency by losing important spatiotemporal information. The proposed PaFu-KV method uses salience estimation to retain informative tokens, achieving high-fidelity video generation with accelerated inference, as demonstrated in benchmarks.

Video generation is pivotal to digital media creation, and recent advances in autoregressive video generation have markedly enhanced the efficiency of real-time video synthesis. However, existing approaches generally rely on heuristic KV Cache policies, which ignore differences in token importance in long-term video generation. This leads to the loss of critical spatiotemporal information and the accumulation of redundant, invalid cache, thereby degrading video generation quality and efficiency. To address this limitation, we first observe that token contributions to video generation are highly time-heterogeneous and accordingly propose a novel Past- and Future-Informed KV Cache Policy (PaFu-KV). Specifically, PaFu-KV introduces a lightweight Salience Estimation Head distilled from a bidirectional teacher to estimate salience scores, allowing the KV cache to retain informative tokens while discarding less relevant ones. This policy yields a better quality-efficiency trade-off by shrinking KV cache capacity and reducing memory footprint at inference time. Extensive experiments on benchmarks demonstrate that our method preserves high-fidelity video generation quality while enables accelerated inference, thereby enabling more efficient long-horizon video generation. Our code will be released upon paper acceptance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes