AQLM
LLM quantization
superseded — cited as a baseline and beaten by newer methods
3 papers critique it · 5 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites AQLM as a baseline.
“Each codebook is 1MiB. During inference, weights are read from these codebook in an essentialy random access pattern, meaning that the entire codebook must fit in L1 cache to enable fast inference (even L2 cache is too slow). However, 1MiB is larger than any current GPU's L1 cache (the H100 has 256KB), so AQLM inference suffers from high cache miss rates and is actually slower than FP16 on modern GPUs”
— QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks“AQLM proposes learning free-form VQs for different groups, which allows for more flexible quantization. However, this approach has the drawback that decoding requires the lookup operation, which is computationally more expensive than QUIP# and other existing PTQ methods.”
— Learning Grouped Lattice Vector Quantizers for Low-Bit LLM Compression“Among PTQ methods, the vector quantization method AQLM effectively mitigates some of the quantization loss, achieving 64.1 points, it falls 10.5 points short of full precision. The best quantization-aware training method, EfficientQAT, still suffers a 9.1-point decline in average accuracy.”
— ParetoQ: Improving Scaling Laws in Extremely Low-bit LLM Quantization
Beaten on benchmarks
Head-to-head results where a newer method reports beating AQLM. Values are copied from the source paper's tables — verify against the cited paper.
- SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models
SEPTQ beats AQLM · perplexity [2-bit]
20.11 vs 24.62
- QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
QuIP# beats AQLM · Wiki2 [2-70B, 3 bits]
3.35 vs 3.36
- QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
QuIP# beats AQLM · C4 [2-70B, 3 bits]
5.15 vs 5.17
- QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
QuIP# beats AQLM · ArcC [2-70, 3 bits]
50.9 vs 50.0
- QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
QuIP# beats AQLM · ArcE [2-70, 3 bits]
77.7 vs 77.6
- QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
QuIP# beats AQLM · ArcC [2-70, 2 bits]
48.7 vs 47.9
- Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models
LieQ beats AQLM · PIQA [L2-7B 2.00 bits]
77.48 vs 74.92
- Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models
LieQ beats AQLM · ARC-e [L2-7B 2.00 bits]
68.1 vs 66.5
- Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models
LieQ beats AQLM · ARC-c [L2-7B 2.00 bits]
38.14 vs 34.9
- Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models
LieQ beats AQLM · HellaSwag [L2-7B 2.00 bits]
53.75 vs 50.88
- Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models
LieQ beats AQLM · Winogrande [L2-7B 2.00 bits]
65.98 vs 62.43
- Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models
LieQ beats AQLM · PIQA [L2-7B 3.00 bits]
77.31 vs 76.88
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- May 29, 2026
- LFQLFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMsMay 28, 2026
- ADMM-QADMM-Q: An Improved Hessian-based Weight Quantizer for Post-Training Quantization of Large Language ModelsMay 11, 2026
- May 6, 2026
- Apr 11, 2026
- Jan 21, 2026
- Grouped Lattice Vector Quantization (GLVQ)Learning Grouped Lattice Vector Quantizers for Low-Bit LLM CompressionOct 23, 2025
- Sep 28, 2025
- Bi-VLMBi-VLM: Pushing Ultra-Low Precision Post-Training Quantization Boundaries in Vision-Language ModelsSep 23, 2025
- Sep 18, 2025