GPTAQ
GPTAQ: Efficient Finetuning-Free Quantization for Asymmetric CalibrationLLM quantization · first seen Apr 3, 2025
superseded — cited as a baseline and beaten by newer methods
1 papers critique it · 3 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites GPTAQ as a baseline.
“This cross-layer residual is beneficial for reducing accumulated quantization errors; however, it may also introduce additional Hessian-approximation (HA) bias.”
— MARR: Module-Adaptive Residual Reconstruction for Low-Bit Post-Training Quantization
Beaten on benchmarks
Head-to-head results where a newer method reports beating GPTAQ. Values are copied from the source paper's tables — verify against the cited paper.
- VLMQ: Efficient Post-Training Quantization for Large Vision-Language Models via Hessian Augmentation
VLMQ beats GPTAQ · Avg [Qwen2-VL-7B-Instruct-INT3g128]
74.40 vs 73.68
- MARR: Module-Adaptive Residual Reconstruction for Low-Bit Post-Training Quantization
GPTAQ-MARR beats GPTAQ · Wiki2 [Llama2-7b W4A4]
5.83 vs 5.86
- MARR: Module-Adaptive Residual Reconstruction for Low-Bit Post-Training Quantization
GPTAQ-MARR beats GPTAQ · Avg [Llama2-7b W4A4]
67.94 vs 66.98
- MARR: Module-Adaptive Residual Reconstruction for Low-Bit Post-Training Quantization
GPTAQ-MARR beats GPTAQ · Wiki2 [Llama2-7b W2A4]
9.56 vs 11.24
- MARR: Module-Adaptive Residual Reconstruction for Low-Bit Post-Training Quantization
GPTAQ-MARR beats GPTAQ · Avg [Llama2-7b W2A4]
48.86 vs 47.65
- MARR: Module-Adaptive Residual Reconstruction for Low-Bit Post-Training Quantization
GPTAQ-MARR beats GPTAQ · Avg [Llama2-13b W4A4]
70.79 vs 70.63
- Quant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language Models
Quant-dLLM beats GPTAQ · Average [2-bit weight quantization on LLaDA-8B-Base]
54.06 vs 35.87
- Quant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language Models
Quant-dLLM beats GPTAQ · Average [2-bit weight quantization on LLaDA-8B-Instruct]
55.53 vs 41.34
- Quant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language Models
Quant-dLLM beats GPTAQ · Average [2-bit weight quantization on LLaDA-1.5]
54.21 vs 37.58
- Quant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language Models
Quant-dLLM beats GPTAQ · Average [2-bit weight quantization on Dream-7B-Base]
44.75 vs 31.21
- Quant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language Models
Quant-dLLM beats GPTAQ · Average [2-bit weight quantization on Dream-7B-Instruct]
47.99 vs 31.30
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- May 19, 2026
- May 18, 2026
- Quantization-aware Integrated Gradients (QIG)Fine-Grained Post-Training Quantization for Large Vision Language Models with Quantization-Aware Integrated GradientsMar 18, 2026
- SPEED-QSPEED-Q: Staged Processing with Enhanced Distillation towards Efficient Low-bit On-device VLM QuantizationNov 12, 2025
- Quant-dLLMQuant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language ModelsSep 27, 2025