CVAIFeb 20, 2025

Hardware-Friendly Static Quantization Method for Video Diffusion Transformers

arXiv:2502.15077v31 citationsh-index: 2MIPR
Originality Synthesis-oriented
AI Analysis

This work addresses the need for efficient deployment of generative-AI models on AI processors, though it is incremental as it adapts existing quantization techniques to a specific model.

The paper tackles the problem of deploying Video Diffusion Transformers on resource-constrained devices by proposing a static quantization method, achieving video quality comparable to FP16 and dynamic quantization with similar CLIP and VQA metrics.

Diffusion Transformers for video generation have gained significant research interest since the impressive performance of SORA. Efficient deployment of such generative-AI models on GPUs has been demonstrated with dynamic quantization. However, resource-constrained devices cannot support dynamic quantization, and need static quantization of the models for their efficient deployment on AI processors. In this paper, we propose a novel method for the post-training quantization of OpenSora\cite{opensora}, a Video Diffusion Transformer, without relying on dynamic quantization techniques. Our approach employs static quantization, achieving video quality comparable to FP16 and dynamically quantized ViDiT-Q methods, as measured by CLIP, and VQA metrics. In particular, we utilize per-step calibration data to adequately provide a post-training statically quantized model for each time step, incorporating channel-wise quantization for weights and tensor-wise quantization for activations. By further applying the smooth-quantization technique, we can obtain high-quality video outputs with the statically quantized models. Extensive experimental results demonstrate that static quantization can be a viable alternative to dynamic quantization for video diffusion transformers, offering a more efficient approach without sacrificing performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes