OmniQuant
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language ModelsLLM quantization · first seen Aug 25, 2023
heavily superseded — a standard baseline that newer methods routinely beat
2 papers critique it · 14 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites OmniQuant as a baseline.
“while AWQ falls apart at even 2.15 bits omniquant and OmniQuant produces unusable models at 2 bits, produces high quality models that are close to OmniQuant 3 bit models.”
— QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks“However, these methods cannot effectively quantize the LLMs to 4-bit weights and activations.”
— Adaptive Layer-Wise Transformations for Post-Training Quantization of Large Language Models
Beaten on benchmarks
Head-to-head results where a newer method reports beating OmniQuant. Values are copied from the source paper's tables — verify against the cited paper.
- SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models
SEPTQ beats OmniQuant · perplexity [2-bit]
68.62 vs 75.43
- SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models
SEPTQ beats OmniQuant · perplexity [3-bit]
33.77 vs 35.66
- SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models
SEPTQ beats OmniQuant · perplexity [4-bit]
29.00 vs 29.45
- TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block Reconstruction
TesseraQ beats OmniQuant · Perplexity [W2A16]
8.05 vs 37.37
- TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block Reconstruction
TesseraQ beats OmniQuant · Perplexity [W2A16g128]
6.82 vs 11.06
- TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block Reconstruction
TesseraQ beats OmniQuant · Perplexity [W2A16g64]
6.67 vs 9.62
- TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block Reconstruction
TesseraQ beats OmniQuant · Perplexity [W3A16]
5.84 vs 6.58
- TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block Reconstruction
TesseraQ beats OmniQuant · Perplexity [W4A16]
5.56 vs 5.74
- TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block Reconstruction
TesseraQ beats OmniQuant · Avg. [W2A16g128]
59.27 vs 47.59
- QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
QuaRot beats OmniQuant · PPL [4-bit, Llama 7B]
6.10 vs 14.26
- QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
QuaRot beats OmniQuant · PPL [4-bit, Llama 13B]
5.40 vs 12.30
- Layer-Wise High-Impact Parameter Ratio Optimization in Post-Training Quantization for Large Language Models
Layer-Wise High-Impact Parameter Ratio Optimization beats OmniQuant · Perplexity [LLaMA-2-7B, W2A16]
14.64 vs 90.64
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- May 29, 2026
- LFQLFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMsMay 28, 2026
- ADMM-QADMM-Q: An Improved Hessian-based Weight Quantizer for Post-Training Quantization of Large Language ModelsMay 11, 2026
- May 6, 2026
- Apr 11, 2026
- Jan 21, 2026
- Grouped Lattice Vector Quantization (GLVQ)Learning Grouped Lattice Vector Quantizers for Low-Bit LLM CompressionOct 23, 2025
- Sep 28, 2025
- Bi-VLMBi-VLM: Pushing Ultra-Low Precision Post-Training Quantization Boundaries in Vision-Language ModelsSep 23, 2025
- Sep 18, 2025