SmoothQuant
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language ModelsLLM quantization · first seen Nov 18, 2022
heavily superseded — a standard baseline that newer methods routinely beat
13 papers critique it · 9 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites SmoothQuant as a baseline.
“SmoothQuant can incur a significant accuracy degradation of up to 15 points on Llama2-70B in W4A4 scenarios”
— OSC: Hardware Efficient W4A4 Quantization via Outlier Separation in Channel Dimension“smoothing-based methods often transfer the burden of quantization from activations to weights without eliminating it.”
— SpecQuant: Spectral Decomposition and Adaptive Truncation for Ultra-Low-Bit LLMs Quantization“existing PTQ approaches for LLMs minimize the layer-wise reconstruction loss while treating all tokens uniformly lin2023awq,frantar2022gptq,li2025gptqv2, without accounting for token-level informativeness or importance. Such a token-agnostic design inevitably biases the quantized model toward dominant but redundant visual features”
— VLMQ: Efficient Post-Training Quantization for Large Vision-Language Models via Hessian Augmentation“However, GPTQ and SmoothQuant~(SQ), which are strong PTQ methods for pure LLMs, do not reliably improve performance in this multimodal setting.”
— Fine-Grained Post-Training Quantization for Large Vision Language Models with Quantization-Aware Integrated Gradients“While these approaches effectively stabilize numerical distributions of language and 2D vision models, they generalize poorly to 3D geometry networks such as VGGT.”
— QVGGT: Post-Training Quantized Visual Geometry Grounded Transformer“However, these methods cannot effectively quantize the LLMs to 4-bit weights and activations.”
— Adaptive Layer-Wise Transformations for Post-Training Quantization of Large Language Models“Such methods provide useful insight that token distributions are highly non-uniform, yet their main lever is changing the bit-width.”
— When W4A4 Breaks Camouflaged Object Detection: Token-Group Dual-Constraint Activation Quantization“SmoothQuant~xiao2023smoothquant smooths the distribution of LayerNorm activations before quantization. Unfortunately, when performing low-bit (e.g., 4-bit) quantization, these schemes tend to underperform significantly”
— RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization“SmoothQuant suffers from non-negligible performance degradation for other open-source models such as Llama and Llama $2$ with a $8$-bit per-tensor static activation quantization scheme”
— LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices“we observe notable degradation on more complex tasks such as math reasoning and code generation”
— Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs“However, these methods operate primarily along the feature dimension and ignore correlations across the sequence dimension.”
— STaMP: Sequence Transformation and Mixed Precision for Low-Precision Activation Quantization“Though alleviating the outlier issues, the asymmetry in activations is still challenging to symmetric quantization.”
— MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision Transformer
Beaten on benchmarks
Head-to-head results where a newer method reports beating SmoothQuant. Values are copied from the source paper's tables — verify against the cited paper.
- QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
QuaRot beats SmoothQuant · PPL [4-bit, Llama 7B]
6.10 vs 83.12
- QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
QuaRot beats SmoothQuant · PPL [4-bit, Llama 13B]
5.40 vs 35.88
- Breaking Modality Heterogeneity in Low-Bit Quantization for Large Vision-Language Models
SplitQ beats SmoothQuant · Avg. [W4A8]
70.4 vs 56.9
- Breaking Modality Heterogeneity in Low-Bit Quantization for Large Vision-Language Models
SplitQ beats SmoothQuant · Avg. [W4A4]
69.6 vs 3.9
- SpecQuant: Spectral Decomposition and Adaptive Truncation for Ultra-Low-Bit LLMs Quantization
SpecQuant beats SmoothQuant · 0-shot^9 [4-16-16]
66.88 vs 62.79
- SpecQuant: Spectral Decomposition and Adaptive Truncation for Ultra-Low-Bit LLMs Quantization
SpecQuant beats SmoothQuant · Wiki [4-16-16]
6.48 vs 8.12
- Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization
MicroRotated-GPTQ beats SmoothQuant · Avg [NVFP4 W4A4]
75.84 vs 75.70
- QVGGT: Post-Training Quantized Visual Geometry Grounded Transformer
QVGGT (ours) beats SmoothQuant · Acc [W4A16]
0.031 vs 0.067
- QVGGT: Post-Training Quantized Visual Geometry Grounded Transformer
QVGGT (ours) beats SmoothQuant · Comp [W4A16]
0.035 vs 0.058
- QVGGT: Post-Training Quantized Visual Geometry Grounded Transformer
QVGGT (ours) beats SmoothQuant · NC [W4A16]
0.849 vs 0.702
- RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization
RepQuant (ours) beats SmoothQuant · Perplexity (lower is better) [4/4 precision]
13.97 vs 21586
- RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization
RepQuant (ours) beats SmoothQuant · Average accuracy [4/4 precision]
50.01 vs 30.01
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- Jun 5, 2026
- FAIR-CalibFAIR-Calib: Frontier-Aware Instability-Reweighted Calibration for Post-Training Quantization of Diffusion Large Language ModelsJun 4, 2026
- May 25, 2026
- May 11, 2026
- Activation Residual Hessian Quantization (ARHQ)Technical Report: Activation Residual Hessian Quantization (ARHQ) for Low-Bit LLM QuantizationApr 30, 2026
- Apr 20, 2026
- Apr 14, 2026
- Mar 26, 2026
- Jan 29, 2026
- Reasoning-QATWhat Makes Low-Bit Quantization-Aware Training Work for Reasoning LLMs? A Systematic StudyJan 21, 2026
- Dec 3, 2025
- adaptive transformation selection frameworkAdaptive Layer-Wise Transformations for Post-Training Quantization of Large Language ModelsNov 21, 2025