Method Drift›KV-cache compression
QuaRot
QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMsKV-cache compression · first seen Mar 30, 2024
superseded — cited as a baseline and beaten by newer methods
2 papers critique it · 2 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites QuaRot as a baseline.
“However, a random rotation is still data-oblivious. It can smooth activation ranges, but it does not know which directions are important to attention. At INT2, this distinction matters: only four quantization levels are available, so the error should be pushed into directions that the model reads less strongly.”
— OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization“While this approach has shown effectiveness at moderate precision levels, such as a 4-bit KV cache, its applicability under more aggressive quantization settings remains unexplored.”
— KVLinC : KV Cache Quantization with Hadamard Rotation and Linear Correction
Beaten on benchmarks
Head-to-head results where a newer method reports beating QuaRot. Values are copied from the source paper's tables — verify against the cited paper.
- OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization
OScaR beats QuaRot · Mean [Qwen3-4B-Thinking-2507, BPE ~2.25-2.28]
71.864 vs 1.40
- OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization
OScaR beats QuaRot · Mean [Qwen3-8B, BPE ~2.25-2.28]
69.416 vs 10.14
- OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization
OScaR beats QuaRot · Mean [Qwen3-32B, BPE ~2.25-2.28]
74.17 vs 7.90
- OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization
OScaR beats QuaRot · Mean [GLM-4.7-FP8 358B, BPE ~2.25-2.28]
78.16 vs 75.14
- OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond
OScaR (ours) beats QuaRot · Avg. [Llama-3.1-8B, INT2]
41.75 vs 37.94
- OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond
OScaR (ours) beats QuaRot · Avg. [Qwen3-8B, INT2]
48.74 vs 40.13
- OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond
OScaR (ours) beats QuaRot · Final Score [LLaVA-v1.6-vicuna-7B, INT2, group size 128]
519 vs 481
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- SpectrumKVSpectrumKV: Per-Token Mixed-Precision KV Cache Transfer for Prefill-Decode Disaggregated LLM ServingJun 7, 2026
- Hurwitz Quaternion Multiplicative Quantization (HQMQ)Hurwitz Quaternion Multiplicative Quantization for KV Cache CompressionMay 26, 2026
- May 18, 2026
- May 18, 2026
- TriAxialKVTriAxialKV: Toward Extreme Low-Precision KV-Cache Quantization for Agentic Inference TasksMay 16, 2026
- KVServeKVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM ServingMay 13, 2026
- WindowQuantWindowQuant: Mixed-Precision KV Cache Quantization based on Window-Level Similarity for VLMs Inference OptimizationMay 4, 2026
- Apr 21, 2026
- eOptShrinkQeOptShrinkQ: Near-Lossless KV Cache Compression Through Optimal Spectral Denoising and QuantizationApr 6, 2026
- Apr 3, 2026
- Mar 30, 2026
- Mar 29, 2026