Is SmoothQuant superseded?

SmoothQuant (LLM quantization): heavily superseded — a standard baseline that newer methods routinely beat. 13 paper(s) critique it, 9 beat it on benchmarks — #3 of 80 most-superseded. Sub-problem: cluster led by SmoothQuant. Newer alternatives in the same sub-problem include OffQ, FAIR-Calib, InfoQuant, ConQuR, Activation Residual Hessian Quantization (ARHQ).

Method Drift›LLM quantization

Heavily superseded#3 of 80 most-superseded

SmoothQuant

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

LLM quantization · first seen Nov 18, 2022

heavily superseded — a standard baseline that newer methods routinely beat

13 papers critique it · 9 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites SmoothQuant as a baseline.

“SmoothQuant can incur a significant accuracy degradation of up to 15 points on Llama2-70B in W4A4 scenarios”
— OSC: Hardware Efficient W4A4 Quantization via Outlier Separation in Channel Dimension
“smoothing-based methods often transfer the burden of quantization from activations to weights without eliminating it.”
— SpecQuant: Spectral Decomposition and Adaptive Truncation for Ultra-Low-Bit LLMs Quantization
“existing PTQ approaches for LLMs minimize the layer-wise reconstruction loss while treating all tokens uniformly lin2023awq,frantar2022gptq,li2025gptqv2, without accounting for token-level informativeness or importance. Such a token-agnostic design inevitably biases the quantized model toward dominant but redundant visual features”
— VLMQ: Efficient Post-Training Quantization for Large Vision-Language Models via Hessian Augmentation
“However, GPTQ and SmoothQuant~(SQ), which are strong PTQ methods for pure LLMs, do not reliably improve performance in this multimodal setting.”
— Fine-Grained Post-Training Quantization for Large Vision Language Models with Quantization-Aware Integrated Gradients
“While these approaches effectively stabilize numerical distributions of language and 2D vision models, they generalize poorly to 3D geometry networks such as VGGT.”
— QVGGT: Post-Training Quantized Visual Geometry Grounded Transformer
“However, these methods cannot effectively quantize the LLMs to 4-bit weights and activations.”
— Adaptive Layer-Wise Transformations for Post-Training Quantization of Large Language Models
“Such methods provide useful insight that token distributions are highly non-uniform, yet their main lever is changing the bit-width.”
— When W4A4 Breaks Camouflaged Object Detection: Token-Group Dual-Constraint Activation Quantization
“SmoothQuant~xiao2023smoothquant smooths the distribution of LayerNorm activations before quantization. Unfortunately, when performing low-bit (e.g., 4-bit) quantization, these schemes tend to underperform significantly”
— RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization
“SmoothQuant suffers from non-negligible performance degradation for other open-source models such as Llama and Llama $2$ with a $8$-bit per-tensor static activation quantization scheme”
— LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices
“we observe notable degradation on more complex tasks such as math reasoning and code generation”
— Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs
“However, these methods operate primarily along the feature dimension and ignore correlations across the sequence dimension.”
— STaMP: Sequence Transformation and Mixed Precision for Low-Precision Activation Quantization
“Though alleviating the outlier issues, the asymmetry in activations is still challenging to symmetric quantization.”
— MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision Transformer

Beaten on benchmarks

Head-to-head results where a newer method reports beating SmoothQuant. Values are copied from the source paper's tables — verify against the cited paper.

QuaRot beats SmoothQuant · PPL [4-bit, Llama 7B]
6.10 vs 83.12
QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
QuaRot beats SmoothQuant · PPL [4-bit, Llama 13B]
5.40 vs 35.88
QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
SplitQ beats SmoothQuant · Avg. [W4A8]
70.4 vs 56.9
Breaking Modality Heterogeneity in Low-Bit Quantization for Large Vision-Language Models
SplitQ beats SmoothQuant · Avg. [W4A4]
69.6 vs 3.9
Breaking Modality Heterogeneity in Low-Bit Quantization for Large Vision-Language Models
SpecQuant beats SmoothQuant · 0-shot^9 [4-16-16]
66.88 vs 62.79
SpecQuant: Spectral Decomposition and Adaptive Truncation for Ultra-Low-Bit LLMs Quantization
SpecQuant beats SmoothQuant · Wiki [4-16-16]
6.48 vs 8.12
SpecQuant: Spectral Decomposition and Adaptive Truncation for Ultra-Low-Bit LLMs Quantization
MicroRotated-GPTQ beats SmoothQuant · Avg [NVFP4 W4A4]
75.84 vs 75.70
Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization
QVGGT (ours) beats SmoothQuant · Acc [W4A16]
0.031 vs 0.067
QVGGT: Post-Training Quantized Visual Geometry Grounded Transformer
QVGGT (ours) beats SmoothQuant · Comp [W4A16]
0.035 vs 0.058
QVGGT: Post-Training Quantized Visual Geometry Grounded Transformer
QVGGT (ours) beats SmoothQuant · NC [W4A16]
0.849 vs 0.702
QVGGT: Post-Training Quantized Visual Geometry Grounded Transformer
RepQuant (ours) beats SmoothQuant · Perplexity (lower is better) [4/4 precision]
13.97 vs 21586
RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization
RepQuant (ours) beats SmoothQuant · Average accuracy [4/4 precision]
50.01 vs 30.01
RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.