LGApr 3, 2025

GPTAQ: Efficient Finetuning-Free Quantization for Asymmetric Calibration

arXiv:2504.02692v342 citationsh-index: 11Has CodeICML
Originality Incremental advance
AI Analysis

This work addresses the need for efficient model compression in AI, particularly for large transformers, though it appears incremental as it builds upon the GPTQ method.

The authors tackled the problem of compressing large-scale transformer architectures by introducing GPTAQ, a finetuning-free quantization method that uses asymmetric calibration to reduce accumulated quantization errors, achieving efficient compression of models like a 405B language transformer and EVA-02 vision transformer on a single GPU.

We introduce GPTAQ, a novel finetuning-free quantization method for compressing large-scale transformer architectures. Unlike the previous GPTQ method, which independently calibrates each layer, we always match the quantized layer's output to the exact output in the full-precision model, resulting in a scheme that we call asymmetric calibration. Such a scheme can effectively reduce the quantization error accumulated in previous layers. We analyze this problem using optimal brain compression to derive a close-formed solution. The new solution explicitly minimizes the quantization error as well as the accumulated asymmetry error. Furthermore, we utilize various techniques to parallelize the solution calculation, including channel parallelization, neuron decomposition, and Cholesky reformulation for matrix fusion. As a result, GPTAQ is easy to implement, simply using 20 more lines of code than GPTQ but improving its performance under low-bit quantization. Remarkably, on a single GPU, we quantize a 405B language transformer as well as EVA-02, the rank first vision transformer that achieves 90% pretraining Imagenet accuracy. Code is available at Github.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes