Is QuaRot superseded?

QuaRot (KV-cache compression): superseded — cited as a baseline and beaten by newer methods. 2 paper(s) critique it, 2 beat it on benchmarks — #46 of 234 most-superseded. Sub-problem: cluster led by KIVI. Newer alternatives in the same sub-problem include SpectrumKV, Hurwitz Quaternion Multiplicative Quantization (HQMQ), OSCAR, OScaR, TriAxialKV.

Method Drift›KV-cache compression

Superseded baseline#46 of 234 most-superseded

QuaRot

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs

KV-cache compression · first seen Mar 30, 2024

superseded — cited as a baseline and beaten by newer methods

2 papers critique it · 2 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites QuaRot as a baseline.

“However, a random rotation is still data-oblivious. It can smooth activation ranges, but it does not know which directions are important to attention. At INT2, this distinction matters: only four quantization levels are available, so the error should be pushed into directions that the model reads less strongly.”
— OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization
“While this approach has shown effectiveness at moderate precision levels, such as a 4-bit KV cache, its applicability under more aggressive quantization settings remains unexplored.”
— KVLinC : KV Cache Quantization with Hadamard Rotation and Linear Correction

Beaten on benchmarks

Head-to-head results where a newer method reports beating QuaRot. Values are copied from the source paper's tables — verify against the cited paper.

OScaR beats QuaRot · Mean [Qwen3-4B-Thinking-2507, BPE ~2.25-2.28]
71.864 vs 1.40
OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization
OScaR beats QuaRot · Mean [Qwen3-8B, BPE ~2.25-2.28]
69.416 vs 10.14
OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization
OScaR beats QuaRot · Mean [Qwen3-32B, BPE ~2.25-2.28]
74.17 vs 7.90
OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization
OScaR beats QuaRot · Mean [GLM-4.7-FP8 358B, BPE ~2.25-2.28]
78.16 vs 75.14
OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization
OScaR (ours) beats QuaRot · Avg. [Llama-3.1-8B, INT2]
41.75 vs 37.94
OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond
OScaR (ours) beats QuaRot · Avg. [Qwen3-8B, INT2]
48.74 vs 40.13
OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond
OScaR (ours) beats QuaRot · Final Score [LLaVA-v1.6-vicuna-7B, INT2, group size 128]
519 vs 481
OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.