CLAIJun 9

LC-QAT: Data-Efficient 2-Bit QAT for LLMs via Linear-Constrained Vector Quantization

Haoyu Wang, Xingyu Yu, Haiyan Zhao, Fengxiang Wang, Xu Han
arXiv:2606.10531v18.5
Originality Highly original
AI Analysis

This work addresses the performance degradation of 2-bit quantized LLMs by enabling efficient vector quantization with end-to-end training, making extreme low-bit deployment more practical.

LC-QAT introduces a 2-bit weight-only vector quantization framework for LLMs that uses a learned affine mapping over discrete vectors, enabling end-to-end differentiable training without codebook lookup. It achieves state-of-the-art results while using only 0.1%–10% of the training data compared to existing QAT methods.

Quantization-aware training (QAT) is essential for extremely low-bit large language models (LLMs). Current QAT methods are mainly based on scalar quantization (SQ), which enables efficient optimization but suffers from severe performance degradation at 2-bit precision. On the other hand, vector quantization (VQ) provides substantially higher representational capacity, but its discrete codebook lookup prevents end-to-end training. We propose LC-QAT, a 2-bit weight-only VQ-QAT framework that represents quantized weights via a learned affine mapping over discrete vectors, which yields a high-quality PTQ initialization and enables fully differentiable end-to-end optimization without explicit codebook lookup in the training forward pass. This strong post-training initialization makes LC-QAT highly data-efficient. Experiments across diverse LLMs demonstrate that LC-QAT consistently outperforms state-of-the-art QAT methods while using only 0.1%--10% of the training data. Our results establish LC-QAT as a practical and scalable solution for extreme low-bit model deployment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes