LG AIOct 23, 2025

Learning Grouped Lattice Vector Quantizers for Low-Bit LLM Compression

Xi Zhang, Xiaolin Wu, Jiamang Wang, Weisi Lin

arXiv:2510.20984v12 citationsh-index: 4Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of deploying large models under stringent resource constraints, representing an incremental improvement in quantization methods.

The paper tackles the problem of performance degradation in low-bit post-training quantization of large language models by introducing a Grouped Lattice Vector Quantization framework, which achieves a better trade-off between model size and accuracy compared to existing baselines.

Large Language Models (LLMs) have demonstrated remarkable capabilities but typically require extensive computational resources and memory for inference. Post-training quantization (PTQ) can effectively reduce these demands by storing weights in lower bit-width formats. However, standard uniform quantization often leads to notable performance degradation, particularly in low-bit scenarios. In this work, we introduce a Grouped Lattice Vector Quantization (GLVQ) framework that assigns each group of weights a customized lattice codebook, defined by a learnable generation matrix. To address the non-differentiability of the quantization process, we adopt Babai rounding to approximate nearest-lattice-point search during training, which enables stable optimization of the generation matrices. Once trained, decoding reduces to a simple matrix-vector multiplication, yielding an efficient and practical quantization pipeline. Experiments on multiple benchmarks show that our approach achieves a better trade-off between model size and accuracy compared to existing post-training quantization baselines, highlighting its effectiveness in deploying large models under stringent resource constraints. Our source code is available on GitHub repository: https://github.com/xzhang9308/GLVQ.

View on arXiv PDF Code

Similar