AQLM (LLM quantization): superseded — cited as a baseline and beaten by newer methods. 3 paper(s) critique it, 5 beat it on benchmarks — #8 of 80 most-superseded. Sub-problem: cluster led by GPTQ. Newer alternatives in the same sub-problem include QVGGT, LFQ, ADMM-Q, OSAQ, SEPTQ.

Is AQLM superseded? Critiques, benchmarks & alternatives

What papers say

Verbatim critique sentences, each from a paper that cites AQLM as a baseline.

“Each codebook is 1MiB. During inference, weights are read from these codebook in an essentialy random access pattern, meaning that the entire codebook must fit in L1 cache to enable fast inference (even L2 cache is too slow). However, 1MiB is larger than any current GPU's L1 cache (the H100 has 256KB), so AQLM inference suffers from high cache miss rates and is actually slower than FP16 on modern GPUs”
— QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
“AQLM proposes learning free-form VQs for different groups, which allows for more flexible quantization. However, this approach has the drawback that decoding requires the lookup operation, which is computationally more expensive than QUIP# and other existing PTQ methods.”
— Learning Grouped Lattice Vector Quantizers for Low-Bit LLM Compression
“Among PTQ methods, the vector quantization method AQLM effectively mitigates some of the quantization loss, achieving 64.1 points, it falls 10.5 points short of full precision. The best quantization-aware training method, EfficientQAT, still suffers a 9.1-point decline in average accuracy.”
— ParetoQ: Improving Scaling Laws in Extremely Low-bit LLM Quantization

Beaten on benchmarks

Head-to-head results where a newer method reports beating AQLM. Values are copied from the source paper's tables — verify against the cited paper.

SEPTQ beats AQLM · perplexity [2-bit]
20.11 vs 24.62
SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models
QuIP# beats AQLM · Wiki2 [2-70B, 3 bits]
3.35 vs 3.36
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
QuIP# beats AQLM · C4 [2-70B, 3 bits]
5.15 vs 5.17
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
QuIP# beats AQLM · ArcC [2-70, 3 bits]
50.9 vs 50.0
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
QuIP# beats AQLM · ArcE [2-70, 3 bits]
77.7 vs 77.6
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
QuIP# beats AQLM · ArcC [2-70, 2 bits]
48.7 vs 47.9
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
LieQ beats AQLM · PIQA [L2-7B 2.00 bits]
77.48 vs 74.92
Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models
LieQ beats AQLM · ARC-e [L2-7B 2.00 bits]
68.1 vs 66.5
Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models
LieQ beats AQLM · ARC-c [L2-7B 2.00 bits]
38.14 vs 34.9
Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models
LieQ beats AQLM · HellaSwag [L2-7B 2.00 bits]
53.75 vs 50.88
Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models
LieQ beats AQLM · Winogrande [L2-7B 2.00 bits]
65.98 vs 62.43
Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models
LieQ beats AQLM · PIQA [L2-7B 3.00 bits]
77.31 vs 76.88
Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.