CVDCAug 22, 2024

Real-Time Video Generation with Pyramid Attention Broadcast

arXiv:2408.12588v3111 citationsh-index: 6
Originality Incremental advance
AI Analysis

This work addresses the computational bottleneck in real-time video generation for applications like media production and AI-driven content creation, though it is incremental as it optimizes an existing framework.

The paper tackles the problem of slow video generation in diffusion models by proposing Pyramid Attention Broadcast (PAB), a training-free method that reduces redundancy in attention mechanisms, resulting in up to 10.5x speedup and enabling real-time generation for 720p videos.

We present Pyramid Attention Broadcast (PAB), a real-time, high quality and training-free approach for DiT-based video generation. Our method is founded on the observation that attention difference in the diffusion process exhibits a U-shaped pattern, indicating significant redundancy. We mitigate this by broadcasting attention outputs to subsequent steps in a pyramid style. It applies different broadcast strategies to each attention based on their variance for best efficiency. We further introduce broadcast sequence parallel for more efficient distributed inference. PAB demonstrates up to 10.5x speedup across three models compared to baselines, achieving real-time generation for up to 720p videos. We anticipate that our simple yet effective method will serve as a robust baseline and facilitate future research and application for video generation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes