CVDec 16, 2024

AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration

arXiv:2412.11706v315 citationsh-index: 6ICML
Originality Incremental advance
AI Analysis

This addresses the computational bottleneck in video generation for AI researchers and practitioners, offering an incremental but practical acceleration solution.

The paper tackles the high computational cost of video Diffusion Transformers (DiTs) by proposing AsymRnR, a training-free and model-agnostic method that asymmetrically reduces redundant tokens in attention operations, achieving substantial speedup with negligible quality degradation and sometimes improvement.

Diffusion Transformers (DiTs) have proven effective in generating high-quality videos but are hindered by high computational costs. Existing video DiT sampling acceleration methods often rely on costly fine-tuning or exhibit limited generalization capabilities. We propose Asymmetric Reduction and Restoration (AsymRnR), a training-free and model-agnostic method to accelerate video DiTs. It builds on the observation that redundancies of feature tokens in DiTs vary significantly across different model blocks, denoising steps, and feature types. Our AsymRnR asymmetrically reduces redundant tokens in the attention operation, achieving acceleration with negligible degradation in output quality and, in some cases, even improving it. We also tailored a reduction schedule to distribute the reduction across components adaptively. To further accelerate this process, we introduce a matching cache for more efficient reduction. Backed by theoretical foundations and extensive experimental validation, AsymRnR integrates into state-of-the-art video DiTs and offers substantial speedup.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes