CVMar 6

Training-free Latent Inter-Frame Pruning with Attention Recovery

Dennis Menn, Yuedong Yang, Bokun Wang, Xiwen Wei, Mustafa Munir, Feng Liang, Radu Marculescu, Chenfeng Xu, Diana Marculescu

arXiv:2603.05811v114.51 citationsh-index: 46

Predicted impact top 20% in CV · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the problem of high computational latency in video generation, which is crucial for real-time applications, by improving throughput for users of video editing tools.

This paper tackles the high computational latency in video generation models by identifying and skipping recomputing duplicated latent patches. The proposed method, LIPAR, increases video editing throughput by 1.45x, achieving 12.2 FPS on an NVIDIA A6000 compared to a baseline of 8.4 FPS, without compromising generation quality.

Current video generation models suffer from high computational latency, making real-time applications prohibitively costly. In this paper, we address this limitation by exploiting the temporal redundancy inherent in video latent patches. To this end, we propose the Latent Inter-frame Pruning with Attention Recovery (LIPAR) framework, which detects and skips recomputing duplicated latent patches. Additionally, we introduce a novel Attention Recovery mechanism that approximates the attention values of pruned tokens, thereby removing visual artifacts arising from naively applying the pruning method. Empirically, our method increases video editing throughput by $1.45\times$, on average achieving 12.2 FPS on an NVIDIA A6000 compared to the baseline 8.4 FPS. The proposed method does not compromise generation quality and can be seamlessly integrated with the model without additional training. Our approach effectively bridges the gap between traditional compression algorithms and modern generative pipelines.

View on arXiv PDF

Similar