GPTQ
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained TransformersLLM quantization · first seen Oct 31, 2022
heavily superseded — a standard baseline that newer methods routinely beat
13 papers critique it · 28 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites GPTQ as a baseline.
“While GPTQ significantly reduces the local layer-wise MSE, its effect of reducing the global NLL loss is minimal given the same training data and same trainable weights.”
— Understanding the Difficulty of Low-Precision Post-Training Quantization for LLMs“as these techniques do not involve gradient-based optimization, unless task-specific calibration data is utilized, they can suffer substantial accuracy degradation on more challenging benchmarks, particularly text generation”
— LFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMs“existing PTQ approaches for LLMs minimize the layer-wise reconstruction loss while treating all tokens uniformly lin2023awq,frantar2022gptq,li2025gptqv2, without accounting for token-level informativeness or importance. Such a token-agnostic design inevitably biases the quantized model toward dominant but redundant visual features”
— VLMQ: Efficient Post-Training Quantization for Large Vision-Language Models via Hessian Augmentation“However, under low-bit quantization, quantization errors from preceding layers accumulate across the network, making local-only basic reconstruction insufficient.”
— MARR: Module-Adaptive Residual Reconstruction for Low-Bit Post-Training Quantization“However, GPTQ and SmoothQuant~(SQ), which are strong PTQ methods for pure LLMs, do not reliably improve performance in this multimodal setting.”
— Fine-Grained Post-Training Quantization for Large Vision Language Models with Quantization-Aware Integrated Gradients“We further observe that generic PTQ methods such as GPTQ and AWQ suffer significant performance degradation under W4A16, highlighting the challenge of directly applying standard quantization techniques to VGGT.”
— QVGGT: Post-Training Quantized Visual Geometry Grounded Transformer“the same routines collapse on sub-7B models where redundancy is scarce”
— Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models“GPTVQ accumulates quantization errors within vector quantization, leading to an inevitable increase in quantization errors as the vector length increases.”
— VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models“Despite its empirical success, the GPTQ algorithm was only presented as a sequence of greedily applied algebraic operations: the procedure picks one weight at a time, quantizes it via rounding or clipping, and then optimally updates the not-yet-quantized weights to correct for the remaining per-layer loss; it then continues with the next weight, and so on. This procedure leaves an obvious open question: why does a local greedy rule work so well globally? Current literature does not answer this question, leaving little guidance for principled extensions or failure case analysis.”
— The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm“GPTQ~frantar2022gptq poorly handles outliers due to calibration dependence”
— Minimizing the Hidden Cost of Scales: Graph-Guided Ultra-Low-Bit Quantization for Large Language Models“However, the weight updates are computed in a closed form based on second-order gradient information; this is done for each layer separately, which does not consider the dependencies among layers.”
— Exploring Model Invariance with Discrete Search for Ultra-Low-Bit Quantization“because GPTQ optimizes only for reconstruction accuracy, it can unintentionally increase group-targeting biases, which we aim to reduce with Fair-GPTQ”
— Fair-GPTQ: Bias-Aware Quantization for Large Language Models
Beaten on benchmarks
Head-to-head results where a newer method reports beating GPTQ. Values are copied from the source paper's tables — verify against the cited paper.
- HeRo-Q: A General Framework for Stable Low Bit Quantization via Hessian Conditioning
HeRo-Q beats GPTQ · Wiki2 (Perplexity) [Llama-3-8B (W4A8)]
6.96 vs 8.81
- HeRo-Q: A General Framework for Stable Low Bit Quantization via Hessian Conditioning
HeRo-Q beats GPTQ · GSM8K [Llama-3-8B (W4A8)]
75.26 vs 71.89
- HeRo-Q: A General Framework for Stable Low Bit Quantization via Hessian Conditioning
HeRo-Q beats GPTQ · GSM8K [Qwen2.5-7B (W4A8)]
86.12 vs 81.67
- HeRo-Q: A General Framework for Stable Low Bit Quantization via Hessian Conditioning
HeRo-Q beats GPTQ · Wiki2 (Perplexity) [Llama-3-8B (W4A16)]
6.43 vs 6.65
- HeRo-Q: A General Framework for Stable Low Bit Quantization via Hessian Conditioning
HeRo-Q beats GPTQ · GSM8K [Qwen-2.5-7B (W4A16)]
87.50 vs 85.20
- HeRo-Q: A General Framework for Stable Low Bit Quantization via Hessian Conditioning
HeRo-Q beats GPTQ · Wiki2 (Perplexity) [Llama-3-8B (W3A16)]
8.02 vs 20.13
- HeRo-Q: A General Framework for Stable Low Bit Quantization via Hessian Conditioning
HeRo-Q beats GPTQ · GSM8K [Llama-3-8B (W3A16)]
70.15 vs 26.30
- SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models
SEPTQ beats GPTQ · perplexity [2-bit]
53.75 vs 2381.23
- SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models
SEPTQ beats GPTQ · perplexity [3-bit]
30.37 vs 42.01
- SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models
SEPTQ beats GPTQ · perplexity [4-bit]
27.78 vs 29.33
- SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models
SEPTQ beats GPTQ · Accuracy [2-bit, 7B]
48.91 vs 35.19
- SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models
SEPTQ beats GPTQ · Accuracy [2-bit, 13B]
59.62 vs 35.70
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- May 29, 2026
- LFQLFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMsMay 28, 2026
- ADMM-QADMM-Q: An Improved Hessian-based Weight Quantizer for Post-Training Quantization of Large Language ModelsMay 11, 2026
- May 6, 2026
- Apr 11, 2026
- Jan 21, 2026
- Grouped Lattice Vector Quantization (GLVQ)Learning Grouped Lattice Vector Quantizers for Low-Bit LLM CompressionOct 23, 2025
- Sep 28, 2025
- Bi-VLMBi-VLM: Pushing Ultra-Low Precision Post-Training Quantization Boundaries in Vision-Language ModelsSep 23, 2025
- Sep 18, 2025