CVMay 24, 2025

DVD-Quant: Data-free Video Diffusion Transformers Quantization

arXiv:2505.18663v29 citationsh-index: 13Has Code
Originality Incremental advance
AI Analysis

This addresses deployment challenges for video generation models, offering a practical solution for researchers and practitioners, though it is incremental as it builds on existing quantization methods.

The paper tackles the computational and memory inefficiency of Video Diffusion Transformers (DiTs) by proposing DVD-Quant, a data-free quantization framework that achieves approximately 2x speedup over full-precision models while maintaining visual fidelity, enabling W4A4 post-training quantization without quality loss.

Diffusion Transformers (DiTs) have emerged as the state-of-the-art architecture for video generation, yet their computational and memory demands hinder practical deployment. While post-training quantization (PTQ) presents a promising approach to accelerate Video DiT models, existing methods suffer from two critical limitations: (1) dependence on computation-heavy and inflexible calibration procedures, and (2) considerable performance deterioration after quantization. To address these challenges, we propose DVD-Quant, a novel Data-free quantization framework for Video DiTs. Our approach integrates three key innovations: (1) Bounded-init Grid Refinement (BGR) and (2) Auto-scaling Rotated Quantization (ARQ) for calibration data-free quantization error reduction, as well as (3) $δ$-Guided Bit Switching ($δ$-GBS) for adaptive bit-width allocation. Extensive experiments across multiple video generation benchmarks demonstrate that DVD-Quant achieves an approximately 2$\times$ speedup over full-precision baselines on advanced DiT models while maintaining visual fidelity. Notably, DVD-Quant is the first to enable W4A4 PTQ for Video DiTs without compromising video quality. Code and models will be available at https://github.com/lhxcs/DVD-Quant.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes