Is OmniQuant superseded?

OmniQuant (LLM quantization): heavily superseded — a standard baseline that newer methods routinely beat. 2 paper(s) critique it, 14 beat it on benchmarks — #5 of 80 most-superseded. Sub-problem: cluster led by GPTQ. Newer alternatives in the same sub-problem include QVGGT, LFQ, ADMM-Q, OSAQ, SEPTQ.

Method Drift›LLM quantization

Heavily superseded#5 of 80 most-superseded

OmniQuant

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

LLM quantization · first seen Aug 25, 2023

heavily superseded — a standard baseline that newer methods routinely beat

2 papers critique it · 14 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites OmniQuant as a baseline.

“while AWQ falls apart at even 2.15 bits omniquant and OmniQuant produces unusable models at 2 bits, produces high quality models that are close to OmniQuant 3 bit models.”
— QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
“However, these methods cannot effectively quantize the LLMs to 4-bit weights and activations.”
— Adaptive Layer-Wise Transformations for Post-Training Quantization of Large Language Models

Beaten on benchmarks

Head-to-head results where a newer method reports beating OmniQuant. Values are copied from the source paper's tables — verify against the cited paper.

SEPTQ beats OmniQuant · perplexity [2-bit]
68.62 vs 75.43
SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models
SEPTQ beats OmniQuant · perplexity [3-bit]
33.77 vs 35.66
SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models
SEPTQ beats OmniQuant · perplexity [4-bit]
29.00 vs 29.45
SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models
TesseraQ beats OmniQuant · Perplexity [W2A16]
8.05 vs 37.37
TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block Reconstruction
TesseraQ beats OmniQuant · Perplexity [W2A16g128]
6.82 vs 11.06
TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block Reconstruction
TesseraQ beats OmniQuant · Perplexity [W2A16g64]
6.67 vs 9.62
TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block Reconstruction
TesseraQ beats OmniQuant · Perplexity [W3A16]
5.84 vs 6.58
TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block Reconstruction
TesseraQ beats OmniQuant · Perplexity [W4A16]
5.56 vs 5.74
TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block Reconstruction
TesseraQ beats OmniQuant · Avg. [W2A16g128]
59.27 vs 47.59
TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block Reconstruction
QuaRot beats OmniQuant · PPL [4-bit, Llama 7B]
6.10 vs 14.26
QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
QuaRot beats OmniQuant · PPL [4-bit, Llama 13B]
5.40 vs 12.30
QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
Layer-Wise High-Impact Parameter Ratio Optimization beats OmniQuant · Perplexity [LLaMA-2-7B, W2A16]
14.64 vs 90.64
Layer-Wise High-Impact Parameter Ratio Optimization in Post-Training Quantization for Large Language Models

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.