LG CLNov 6, 2025

DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization

Yuantian Shao, Yuanteng Chen, Peisong Wang, Jianlin Yu, Jing Lin, Yiwu Yao, Zhihui Wei, Jian Cheng

arXiv:2511.04063v117.98 citationsh-index: 15Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of making large language model quantization feasible in resource-constrained environments, representing an incremental improvement over existing rotational methods.

The paper tackles the high computational cost and overfitting in rotational optimization for LLM quantization by proposing DartQuant, an efficient distribution-aware rotational calibration method that achieves 47× acceleration and 10× memory savings on a 70B model, enabling calibration on a single 3090 GPU.

Quantization plays a crucial role in accelerating the inference of large-scale models, and rotational matrices have been shown to effectively improve quantization performance by smoothing outliers. However, end-to-end fine-tuning of rotational optimization algorithms incurs high computational costs and is prone to overfitting. To address this challenge, we propose an efficient distribution-aware rotational calibration method, DartQuant, which reduces the complexity of rotational optimization by constraining the distribution of the activations after rotation. This approach also effectively reduces reliance on task-specific losses, thereby mitigating the risk of overfitting. Additionally, we introduce the QR-Orth optimization scheme, which replaces expensive alternating optimization with a more efficient solution. In a variety of model quantization experiments, DartQuant demonstrates superior performance. Compared to existing methods, it achieves 47$\times$ acceleration and 10$\times$ memory savings for rotational optimization on a 70B model. Furthermore, it is the first to successfully complete rotational calibration for a 70B model on a single 3090 GPU, making quantization of large language models feasible in resource-constrained environments. Code is available at https://github.com/CAS-CLab/DartQuant.git.

View on arXiv PDF Code

Similar