AWQ
AWQ: Activation-aware Weight Quantization for LLM Compression and AccelerationLLM quantization · first seen Jun 1, 2023
heavily superseded — a standard baseline that newer methods routinely beat
9 papers critique it · 15 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites AWQ as a baseline.
“However, if the scaling factor is too large, it will increase the quantization loss of non-outlier weights, while if the scaling factor is too small, it cannot protect these outlier weights well.”
— SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models“Most works only focus on optimizing distribution transformation or weight clipping ranges ~awq, outlier-plus, shao2023omniquant. While being straightforward, they prove inadequate for extremely low-bit scenarios due to the constrained optimization space.”
— TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block Reconstruction“while AWQ falls apart at even 2.15 bits omniquant and OmniQuant produces unusable models at 2 bits, produces high quality models that are close to OmniQuant 3 bit models.”
— QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks“as these techniques do not involve gradient-based optimization, unless task-specific calibration data is utilized, they can suffer substantial accuracy degradation on more challenging benchmarks, particularly text generation”
— LFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMs“existing PTQ approaches for LLMs minimize the layer-wise reconstruction loss while treating all tokens uniformly lin2023awq,frantar2022gptq,li2025gptqv2, without accounting for token-level informativeness or importance. Such a token-agnostic design inevitably biases the quantized model toward dominant but redundant visual features”
— VLMQ: Efficient Post-Training Quantization for Large Vision-Language Models via Hessian Augmentation“We further observe that generic PTQ methods such as GPTQ and AWQ suffer significant performance degradation under W4A16, highlighting the challenge of directly applying standard quantization techniques to VGGT.”
— QVGGT: Post-Training Quantized Visual Geometry Grounded Transformer“we observe that GPTQ consistently outperforms AWQ across most tasks”
— Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs“For example, awq determine the scaling coefficients based on the magnitude of the activations.”
— Exploring Model Invariance with Discrete Search for Ultra-Low-Bit Quantization“SmoothQuant and AWQ algorithms also require calibration data to perform quantization.”
— AdpQ: A Zero-shot Calibration Free Adaptive Post Training Quantization Method for LLMs
Beaten on benchmarks
Head-to-head results where a newer method reports beating AWQ. Values are copied from the source paper's tables — verify against the cited paper.
- SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models
SEPTQ beats AWQ · perplexity [2-bit]
68.62 vs 251.84
- SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models
SEPTQ beats AWQ · perplexity [3-bit]
33.77 vs 36.74
- SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models
SEPTQ beats AWQ · perplexity [4-bit]
29.00 vs 32.28
- Post Training Quantization of Large Language Models with Microscaling Formats
SQ+ beats AWQ · 0-shot accuracy [Llama2-7B, A:8, W:8, A:INT8, W:INT8]
58.71 vs 58.23
- Post Training Quantization of Large Language Models with Microscaling Formats
SQ+ beats AWQ · WikiText-2 perplexity [Llama2-7B, A:8, W:8, A:INT8, W:INT8]
5.15 vs 5.17
- QMC: Efficient SLM Edge Inference via Outlier-Aware Quantization and Emergent Memories Co-Design
QMC (no noise) beats AWQ · PPL [LLaMA-3.2B]
10.43 vs 12.67
- STaR-Quant: State-Time Consistent Post-Training Quantization for Diffusion Large Language Models
STaR-Quant beats AWQ · Avg. [W4A4 (4-bit weight and activation) on LLADA-8B]
57.07 vs 48.09
- STaR-Quant: State-Time Consistent Post-Training Quantization for Diffusion Large Language Models
STaR-Quant beats AWQ · Avg. [W4A4 (4-bit weight and activation) on LLADA-1.5-8B]
66.93 vs 58.49
- STaR-Quant: State-Time Consistent Post-Training Quantization for Diffusion Large Language Models
STaR-Quant beats AWQ · Avg. [W4A4 (4-bit weight and activation) on DREAM-7B]
63.59 vs 54.89
- Enhancing Ultra-Low-Bit Quantization of Large Language Models Through Saliency-Aware Partial Retraining
ApiQ beats AWQ · WikiText2 PPL [LLaMA-2-7B, 3-bit]
5.77 vs 6.24
- Enhancing Ultra-Low-Bit Quantization of Large Language Models Through Saliency-Aware Partial Retraining
ApiQ beats AWQ · C4 PPL [LLaMA-2-7B, 3-bit]
7.48 vs 7.84
- Enhancing Ultra-Low-Bit Quantization of Large Language Models Through Saliency-Aware Partial Retraining
ApiQ beats AWQ · WikiText2 PPL
5.12 vs 5.32
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- May 29, 2026
- LFQLFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMsMay 28, 2026
- ADMM-QADMM-Q: An Improved Hessian-based Weight Quantizer for Post-Training Quantization of Large Language ModelsMay 11, 2026
- May 6, 2026
- Apr 11, 2026
- Jan 21, 2026
- Grouped Lattice Vector Quantization (GLVQ)Learning Grouped Lattice Vector Quantizers for Low-Bit LLM CompressionOct 23, 2025
- Sep 28, 2025
- Bi-VLMBi-VLM: Pushing Ultra-Low Precision Post-Training Quantization Boundaries in Vision-Language ModelsSep 23, 2025
- Sep 18, 2025