Living systematic review

LLM quantization

Compressing LLM weights and activations to low bit-widths (4-bit and below) for cheaper inference, while preserving quality — outlier handling, rotation, and post-training quantization.

97 papers · 151 critique receipts · 1,451 benchmark results · updated Jun 18, 2026

Most-superseded baselines

Ranked by how many distinct papers critique or beat each method. These are the standard baselines newer work routinely measures against.

1
GPTQ· GPTQ
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
13 papers critique it · 28 beat it on benchmarks
2
AWQ· GPTQ
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
9 papers critique it · 15 beat it on benchmarks
3
SmoothQuant· SmoothQuant
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
13 papers critique it · 9 beat it on benchmarks
4
QuaRot· SmoothQuant
QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
7 papers critique it · 12 beat it on benchmarks
5
OmniQuant· GPTQ
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
2 papers critique it · 14 beat it on benchmarks
6
RTN· RTN
2 papers critique it · 10 beat it on benchmarks
7
SpinQuant· SmoothQuant
3 papers critique it · 8 beat it on benchmarks
8
AQLM· GPTQ
3 papers critique it · 5 beat it on benchmarks
9
QuIP· GPTQ
QuIP: 2-Bit Quantization of Large Language Models With Guarantees
3 papers critique it · 4 beat it on benchmarks
10
SVDQuant· SmoothQuant
SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
3 papers critique it · 4 beat it on benchmarks
11
PB-LLM· GPTQ
PB-LLM: Partially Binarized Large Language Models
4 papers critique it · 3 beat it on benchmarks
12
FlatQuant· SmoothQuant
FlatQuant: Flatness Matters for LLM Quantization
2 papers critique it · 4 beat it on benchmarks

Sub-problems

Methods that compete on the same benchmarks cluster into distinct sub-problems.

GPTQ · 26 methods

GPTQ · AWQ · OmniQuant · AQLM · QuIP · PB-LLM

SmoothQuant · 33 methods

SmoothQuant · QuaRot · SpinQuant · SVDQuant · FlatQuant · AffineQuant

RTN · 30 methods

RTN · BitNet · QLoRA · EfficientQAT · QTIP · ARB-LLM

GPTAQ · 12 methods

GPTAQ · MBQ · DuQuant · MASQuant · QSLAW · QSVD

PACT · 10 methods

PACT · LSQ · N2UQ · GPLQ · LSQ+ · DiffQ

QDrop · 8 methods

QDrop · PD-Quant · FIMA-Q · AdaLog · EasyQuant · MGRQ

FlexRound · 6 methods

FlexRound · LLM.int8() · ZeroQuant · SplitQuantV2 · LRQ · AdpQ

AMQ · 8 methods

AMQ · SFMP · HAQ · HAWQ · MixLLM · BitStack

AQ-SGD · 3 methods

AQ-SGD · TAH-Quant (Tile-wise Adaptive Hadamard Quantization) · AMAQ

QServe · 3 methods

QServe · APEX4 · LiquidGEMM

The frontier

Recent methods not yet superseded in the knowledge base.