GPTQ (LLM quantization): heavily superseded — a standard baseline that newer methods routinely beat. 13 paper(s) critique it, 28 beat it on benchmarks — #1 of 80 most-superseded. Sub-problem: cluster led by GPTQ. Newer alternatives in the same sub-problem include QVGGT, LFQ, ADMM-Q, OSAQ, SEPTQ.

Heavily superseded#1 of 80 most-superseded

GPTQ

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

LLM quantization · first seen Oct 31, 2022

heavily superseded — a standard baseline that newer methods routinely beat

13 papers critique it · 28 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites GPTQ as a baseline.

“While GPTQ significantly reduces the local layer-wise MSE, its effect of reducing the global NLL loss is minimal given the same training data and same trainable weights.”
— Understanding the Difficulty of Low-Precision Post-Training Quantization for LLMs
“as these techniques do not involve gradient-based optimization, unless task-specific calibration data is utilized, they can suffer substantial accuracy degradation on more challenging benchmarks, particularly text generation”
— LFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMs
“existing PTQ approaches for LLMs minimize the layer-wise reconstruction loss while treating all tokens uniformly lin2023awq,frantar2022gptq,li2025gptqv2, without accounting for token-level informativeness or importance. Such a token-agnostic design inevitably biases the quantized model toward dominant but redundant visual features”
— VLMQ: Efficient Post-Training Quantization for Large Vision-Language Models via Hessian Augmentation
“However, under low-bit quantization, quantization errors from preceding layers accumulate across the network, making local-only basic reconstruction insufficient.”
— MARR: Module-Adaptive Residual Reconstruction for Low-Bit Post-Training Quantization
“However, GPTQ and SmoothQuant~(SQ), which are strong PTQ methods for pure LLMs, do not reliably improve performance in this multimodal setting.”
— Fine-Grained Post-Training Quantization for Large Vision Language Models with Quantization-Aware Integrated Gradients
“We further observe that generic PTQ methods such as GPTQ and AWQ suffer significant performance degradation under W4A16, highlighting the challenge of directly applying standard quantization techniques to VGGT.”
— QVGGT: Post-Training Quantized Visual Geometry Grounded Transformer
“the same routines collapse on sub-7B models where redundancy is scarce”
— Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models
“GPTVQ accumulates quantization errors within vector quantization, leading to an inevitable increase in quantization errors as the vector length increases.”
— VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models
“Despite its empirical success, the GPTQ algorithm was only presented as a sequence of greedily applied algebraic operations: the procedure picks one weight at a time, quantizes it via rounding or clipping, and then optimally updates the not-yet-quantized weights to correct for the remaining per-layer loss; it then continues with the next weight, and so on. This procedure leaves an obvious open question: why does a local greedy rule work so well globally? Current literature does not answer this question, leaving little guidance for principled extensions or failure case analysis.”
— The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm
“GPTQ~frantar2022gptq poorly handles outliers due to calibration dependence”
— Minimizing the Hidden Cost of Scales: Graph-Guided Ultra-Low-Bit Quantization for Large Language Models
“However, the weight updates are computed in a closed form based on second-order gradient information; this is done for each layer separately, which does not consider the dependencies among layers.”
— Exploring Model Invariance with Discrete Search for Ultra-Low-Bit Quantization
“because GPTQ optimizes only for reconstruction accuracy, it can unintentionally increase group-targeting biases, which we aim to reduce with Fair-GPTQ”
— Fair-GPTQ: Bias-Aware Quantization for Large Language Models

Beaten on benchmarks

Head-to-head results where a newer method reports beating GPTQ. Values are copied from the source paper's tables — verify against the cited paper.

HeRo-Q beats GPTQ · Wiki2 (Perplexity) [Llama-3-8B (W4A8)]
6.96 vs 8.81
HeRo-Q: A General Framework for Stable Low Bit Quantization via Hessian Conditioning
HeRo-Q beats GPTQ · GSM8K [Llama-3-8B (W4A8)]
75.26 vs 71.89
HeRo-Q: A General Framework for Stable Low Bit Quantization via Hessian Conditioning
HeRo-Q beats GPTQ · GSM8K [Qwen2.5-7B (W4A8)]
86.12 vs 81.67
HeRo-Q: A General Framework for Stable Low Bit Quantization via Hessian Conditioning
HeRo-Q beats GPTQ · Wiki2 (Perplexity) [Llama-3-8B (W4A16)]
6.43 vs 6.65
HeRo-Q: A General Framework for Stable Low Bit Quantization via Hessian Conditioning
HeRo-Q beats GPTQ · GSM8K [Qwen-2.5-7B (W4A16)]
87.50 vs 85.20
HeRo-Q: A General Framework for Stable Low Bit Quantization via Hessian Conditioning
HeRo-Q beats GPTQ · Wiki2 (Perplexity) [Llama-3-8B (W3A16)]
8.02 vs 20.13
HeRo-Q: A General Framework for Stable Low Bit Quantization via Hessian Conditioning
HeRo-Q beats GPTQ · GSM8K [Llama-3-8B (W3A16)]
70.15 vs 26.30
HeRo-Q: A General Framework for Stable Low Bit Quantization via Hessian Conditioning
SEPTQ beats GPTQ · perplexity [2-bit]
53.75 vs 2381.23
SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models
SEPTQ beats GPTQ · perplexity [3-bit]
30.37 vs 42.01
SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models
SEPTQ beats GPTQ · perplexity [4-bit]
27.78 vs 29.33
SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models
SEPTQ beats GPTQ · Accuracy [2-bit, 7B]
48.91 vs 35.19
SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models
SEPTQ beats GPTQ · Accuracy [2-bit, 13B]
59.62 vs 35.70
SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.