EfficientQAT
EfficientQAT: Efficient Quantization-Aware Training for Large Language ModelsLLM quantization · first seen Jul 10, 2024
superseded — cited as a baseline and beaten by newer methods
1 papers critique it · 2 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites EfficientQAT as a baseline.
“The best quantization-aware training method, EfficientQAT, still suffers a 9.1-point decline in average accuracy. Our method dramatically narrows the 2-bit quantization gap to full precision to just 3.4 points, outperforming the best QAT method by 5.7 points and the vector quantization method by 7.1 points.”
— ParetoQ: Improving Scaling Laws in Extremely Low-bit LLM Quantization
Beaten on benchmarks
Head-to-head results where a newer method reports beating EfficientQAT. Values are copied from the source paper's tables — verify against the cited paper.
- Bit-by-Bit: Progressive QAT Strategy with Outlier Channel Splitting for Stable Low-Bit LLMs
Bit-by-Bit (Ours) beats EfficientQAT · WikiText2 PPL [LLaMA-2 7B, w2a16]
6.50 vs 7.39
- Bit-by-Bit: Progressive QAT Strategy with Outlier Channel Splitting for Stable Low-Bit LLMs
Bit-by-Bit (Ours) beats EfficientQAT · WikiText2 PPL [LLaMA-3.2-3B, w2a16]
11.02 vs 13.31
- Bit-by-Bit: Progressive QAT Strategy with Outlier Channel Splitting for Stable Low-Bit LLMs
Bit-by-Bit (Ours) beats EfficientQAT · WikiText2 PPL [LLaMA-2 7B, w2a2]
7.72 vs 9.71
- Bit-by-Bit: Progressive QAT Strategy with Outlier Channel Splitting for Stable Low-Bit LLMs
Bit-by-Bit (Ours) beats EfficientQAT · WikiText2 PPL [LLaMA-3.2-3B, w2a2]
13.87 vs 20.19
- Bit-by-Bit: Progressive QAT Strategy with Outlier Channel Splitting for Stable Low-Bit LLMs
Bit-by-Bit (Ours) beats EfficientQAT · WikiText2 PPL [LLaMA-3.2-1B, w2a16]
16.13 vs 21.48
- Bit-by-Bit: Progressive QAT Strategy with Outlier Channel Splitting for Stable Low-Bit LLMs
Bit-by-Bit (Ours) beats EfficientQAT · WikiText2 PPL [LLaMA-3-8B, w2a16]
8.32 vs 11.17
- Enhancing Ultra-Low-Bit Quantization of Large Language Models Through Saliency-Aware Partial Retraining
ApiQ beats EfficientQAT · WikiText2 PPL [LLaMA-2-7B, 3-bit]
5.77 vs 5.81
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- STaR-QuantSTaR-Quant: State-Time Consistent Post-Training Quantization for Diffusion Large Language ModelsJun 3, 2026
- May 26, 2026
- May 1, 2026
- Bit-by-BitBit-by-Bit: Progressive QAT Strategy with Outlier Channel Splitting for Stable Low-Bit LLMsApr 9, 2026
- Benford-QuantBenford's Law as a Distributional Prior for Post-Training Quantization of Large Language ModelsJan 29, 2026
- HestiaHESTIA: A Hessian-Guided Differentiable Quantization-Aware Training Framework for Extremely Low-Bit LLMsJan 28, 2026
- Layer-Wise High-Impact Parameter Ratio OptimizationLayer-Wise High-Impact Parameter Ratio Optimization in Post-Training Quantization for Large Language ModelsNov 21, 2025
- Sep 28, 2025