Living systematic review
LLM quantization
Compressing LLM weights and activations to low bit-widths (4-bit and below) for cheaper inference, while preserving quality — outlier handling, rotation, and post-training quantization.
97 papers · 151 critique receipts · 1,451 benchmark results · updated Jun 18, 2026
Most-superseded baselines
Ranked by how many distinct papers critique or beat each method. These are the standard baselines newer work routinely measures against.
- 1GPTQ· GPTQGPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
13 papers critique it · 28 beat it on benchmarks
- 2AWQ· GPTQAWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
9 papers critique it · 15 beat it on benchmarks
- 3SmoothQuant· SmoothQuantSmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
13 papers critique it · 9 beat it on benchmarks
- 4QuaRot· SmoothQuantQuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
7 papers critique it · 12 beat it on benchmarks
- 5OmniQuant· GPTQOmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
2 papers critique it · 14 beat it on benchmarks
- 9QuIP· GPTQQuIP: 2-Bit Quantization of Large Language Models With Guarantees
3 papers critique it · 4 beat it on benchmarks
- 10SVDQuant· SmoothQuantSVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
3 papers critique it · 4 beat it on benchmarks
- 11PB-LLM· GPTQPB-LLM: Partially Binarized Large Language Models
4 papers critique it · 3 beat it on benchmarks
- 12FlatQuant· SmoothQuantFlatQuant: Flatness Matters for LLM Quantization
2 papers critique it · 4 beat it on benchmarks
Sub-problems
Methods that compete on the same benchmarks cluster into distinct sub-problems.
SmoothQuant · 33 methods
SmoothQuant · QuaRot · SpinQuant · SVDQuant · FlatQuant · AffineQuant
FlexRound · 6 methods
FlexRound · LLM.int8() · ZeroQuant · SplitQuantV2 · LRQ · AdpQ
AQ-SGD · 3 methods
AQ-SGD · TAH-Quant (Tile-wise Adaptive Hadamard Quantization) · AMAQ
QServe · 3 methods
QServe · APEX4 · LiquidGEMM
The frontier
Recent methods not yet superseded in the knowledge base.
- Jun 7, 2026
- Jun 5, 2026
- FAIR-CalibFAIR-Calib: Frontier-Aware Instability-Reweighted Calibration for Post-Training Quantization of Diffusion Large Language ModelsJun 4, 2026
- STaR-QuantSTaR-Quant: State-Time Consistent Post-Training Quantization for Diffusion Large Language ModelsJun 3, 2026
- May 29, 2026
- LFQLFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMsMay 28, 2026
- May 26, 2026
- May 25, 2026
- May 19, 2026
- May 18, 2026
- ADMM-QADMM-Q: An Improved Hessian-based Weight Quantizer for Post-Training Quantization of Large Language ModelsMay 11, 2026
- May 11, 2026