LG AIMay 27, 2025

Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning

Maosen Zhao, Pengtao Chen, Chong Yu, Yan Wen, Xudong Tan, Tao Chen

arXiv:2505.21591v114.45 citationsh-index: 4CVPR

Originality Highly original

AI Analysis

This work addresses the problem of efficient deployment of diffusion models for applications requiring reduced memory and faster inference, representing an incremental but practical advancement in model compression.

The paper tackles the challenge of achieving 4-bit quantization for diffusion models, which improves memory efficiency and inference speed but suffers from performance issues in existing methods. The authors propose a mixup-sign floating-point quantization framework with timestep-aware fine-tuning, achieving superior performance compared to existing 4-bit integer quantization methods.

Model quantization reduces the bit-width of weights and activations, improving memory efficiency and inference speed in diffusion models. However, achieving 4-bit quantization remains challenging. Existing methods, primarily based on integer quantization and post-training quantization fine-tuning, struggle with inconsistent performance. Inspired by the success of floating-point (FP) quantization in large language models, we explore low-bit FP quantization for diffusion models and identify key challenges: the failure of signed FP quantization to handle asymmetric activation distributions, the insufficient consideration of temporal complexity in the denoising process during fine-tuning, and the misalignment between fine-tuning loss and quantization error. To address these challenges, we propose the mixup-sign floating-point quantization (MSFP) framework, first introducing unsigned FP quantization in model quantization, along with timestep-aware LoRA (TALoRA) and denoising-factor loss alignment (DFA), which ensure precise and stable fine-tuning. Extensive experiments show that we are the first to achieve superior performance in 4-bit FP quantization for diffusion models, outperforming existing PTQ fine-tuning methods in 4-bit INT quantization.

View on arXiv PDF

Similar