CL AIDec 4, 2025

SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs

Wenhua Cheng, Weiwei Zhang, Heng Guo, Haihao Shen

arXiv:2512.04746v11 citationsh-index: 3Has Code

Originality Incremental advance

AI Analysis

This addresses the need for efficient deployment of LLMs on resource-constrained devices, representing an incremental improvement in quantization methods.

The paper tackles the problem of severe performance degradation in extremely low-bit post-training quantization for Large Language Models (LLMs) by introducing SignRoundV2, a framework that closes the gap with full-precision models, achieving about 1% variance at 4-5 bits and strong results at 2 bits.

Extreme low-bit quantization is critical for efficiently deploying Large Language Models (LLMs), yet it often leads to severe performance degradation at 2-bits and even 4-bits (e.g., MXFP4). We present SignRoundV2, a post-training quantization framework that is highly effective even without mixed-precision. SignRoundV2 introduces (1) a fast sensitivity metric that combines gradient information with quantization-induced deviations to guide layer-wise bit allocation, and (2) a lightweight pre-tuning search for quantization scales to improve extremely low-bit quantization. These components allow SignRoundV2 to close the gap with full-precision models. Extensive experiments indicate that our method sustains competitive accuracy for LLMs, achieving production-grade performance with about 1 percent variance at 4-5 bits and strong results even at 2 bits. The implementation is available at https://github.com/intel/auto-round.

View on arXiv PDF Code

Similar