AWQ (LLM quantization): heavily superseded — a standard baseline that newer methods routinely beat. 9 paper(s) critique it, 15 beat it on benchmarks — #2 of 80 most-superseded. Sub-problem: cluster led by GPTQ. Newer alternatives in the same sub-problem include QVGGT, LFQ, ADMM-Q, OSAQ, SEPTQ.

Heavily superseded#2 of 80 most-superseded

AWQ

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

LLM quantization · first seen Jun 1, 2023

heavily superseded — a standard baseline that newer methods routinely beat

9 papers critique it · 15 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites AWQ as a baseline.

“However, if the scaling factor is too large, it will increase the quantization loss of non-outlier weights, while if the scaling factor is too small, it cannot protect these outlier weights well.”
— SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models
“Most works only focus on optimizing distribution transformation or weight clipping ranges ~awq, outlier-plus, shao2023omniquant. While being straightforward, they prove inadequate for extremely low-bit scenarios due to the constrained optimization space.”
— TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block Reconstruction
“while AWQ falls apart at even 2.15 bits omniquant and OmniQuant produces unusable models at 2 bits, produces high quality models that are close to OmniQuant 3 bit models.”
— QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
“as these techniques do not involve gradient-based optimization, unless task-specific calibration data is utilized, they can suffer substantial accuracy degradation on more challenging benchmarks, particularly text generation”
— LFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMs
“existing PTQ approaches for LLMs minimize the layer-wise reconstruction loss while treating all tokens uniformly lin2023awq,frantar2022gptq,li2025gptqv2, without accounting for token-level informativeness or importance. Such a token-agnostic design inevitably biases the quantized model toward dominant but redundant visual features”
— VLMQ: Efficient Post-Training Quantization for Large Vision-Language Models via Hessian Augmentation
“We further observe that generic PTQ methods such as GPTQ and AWQ suffer significant performance degradation under W4A16, highlighting the challenge of directly applying standard quantization techniques to VGGT.”
— QVGGT: Post-Training Quantized Visual Geometry Grounded Transformer
“we observe that GPTQ consistently outperforms AWQ across most tasks”
— Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs
“For example, awq determine the scaling coefficients based on the magnitude of the activations.”
— Exploring Model Invariance with Discrete Search for Ultra-Low-Bit Quantization
“SmoothQuant and AWQ algorithms also require calibration data to perform quantization.”
— AdpQ: A Zero-shot Calibration Free Adaptive Post Training Quantization Method for LLMs

Beaten on benchmarks

Head-to-head results where a newer method reports beating AWQ. Values are copied from the source paper's tables — verify against the cited paper.

SEPTQ beats AWQ · perplexity [2-bit]
68.62 vs 251.84
SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models
SEPTQ beats AWQ · perplexity [3-bit]
33.77 vs 36.74
SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models
SEPTQ beats AWQ · perplexity [4-bit]
29.00 vs 32.28
SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models
SQ+ beats AWQ · 0-shot accuracy [Llama2-7B, A:8, W:8, A:INT8, W:INT8]
58.71 vs 58.23
Post Training Quantization of Large Language Models with Microscaling Formats
SQ+ beats AWQ · WikiText-2 perplexity [Llama2-7B, A:8, W:8, A:INT8, W:INT8]
5.15 vs 5.17
Post Training Quantization of Large Language Models with Microscaling Formats
QMC (no noise) beats AWQ · PPL [LLaMA-3.2B]
10.43 vs 12.67
QMC: Efficient SLM Edge Inference via Outlier-Aware Quantization and Emergent Memories Co-Design
STaR-Quant beats AWQ · Avg. [W4A4 (4-bit weight and activation) on LLADA-8B]
57.07 vs 48.09
STaR-Quant: State-Time Consistent Post-Training Quantization for Diffusion Large Language Models
STaR-Quant beats AWQ · Avg. [W4A4 (4-bit weight and activation) on LLADA-1.5-8B]
66.93 vs 58.49
STaR-Quant: State-Time Consistent Post-Training Quantization for Diffusion Large Language Models
STaR-Quant beats AWQ · Avg. [W4A4 (4-bit weight and activation) on DREAM-7B]
63.59 vs 54.89
STaR-Quant: State-Time Consistent Post-Training Quantization for Diffusion Large Language Models
ApiQ beats AWQ · WikiText2 PPL [LLaMA-2-7B, 3-bit]
5.77 vs 6.24
Enhancing Ultra-Low-Bit Quantization of Large Language Models Through Saliency-Aware Partial Retraining
ApiQ beats AWQ · C4 PPL [LLaMA-2-7B, 3-bit]
7.48 vs 7.84
Enhancing Ultra-Low-Bit Quantization of Large Language Models Through Saliency-Aware Partial Retraining
ApiQ beats AWQ · WikiText2 PPL
5.12 vs 5.32
Enhancing Ultra-Low-Bit Quantization of Large Language Models Through Saliency-Aware Partial Retraining

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.