Method Drift›KV-cache compression
GEAR
GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLMKV-cache compression · first seen Mar 8, 2024
superseded — cited as a baseline and beaten by newer methods
2 papers critique it · 2 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites GEAR as a baseline.
“At long context lengths, this overhead becomes non-negligible, limiting the overall compression benefits of KV cache quantization.”
— KVLinC : KV Cache Quantization with Hadamard Rotation and Linear Correction“Despite its low accuracy drop, its need to solve small optimization problems for the low-rank decomposition leads to runtime overhead.”
— InnerQ: Hardware-aware Tuning-free Quantization of KV Cache for Large Language Models
Beaten on benchmarks
Head-to-head results where a newer method reports beating GEAR. Values are copied from the source paper's tables — verify against the cited paper.
- KV Cache Transform Coding for Compact Storage in LLM Inference
KVTC beats GEAR · GSM8K [Llama 3.1 8B]
56.9 vs 52.8
- KV Cache Transform Coding for Compact Storage in LLM Inference
KVTC beats GEAR · MMLU [Llama 3.1 8B]
60.1 vs 59.6
- KV Cache Transform Coding for Compact Storage in LLM Inference
KVTC beats GEAR · GSM8K [MN-Minitron 8B]
60.3 vs 57.9
- KV Cache Transform Coding for Compact Storage in LLM Inference
KVTC beats GEAR · MMLU [MN-Minitron 8B]
64.1 vs 63.6
- KV Cache Transform Coding for Compact Storage in LLM Inference
KVTC beats GEAR · GSM8K [Mistral NeMo 12B]
62.0 vs 59.8
- KV Cache Transform Coding for Compact Storage in LLM Inference
KVTC beats GEAR · MMLU [Mistral NeMo 12B]
64.4 vs 64.0
- CSR:Achieving 1 Bit Key-Value Cache via Sparse Representation
CSR beats GEAR · Average [Llama2-7B-Chat (4-bit comparable)]
37.58 vs 37.42
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- SpectrumKVSpectrumKV: Per-Token Mixed-Precision KV Cache Transfer for Prefill-Decode Disaggregated LLM ServingJun 7, 2026
- Hurwitz Quaternion Multiplicative Quantization (HQMQ)Hurwitz Quaternion Multiplicative Quantization for KV Cache CompressionMay 26, 2026
- May 18, 2026
- May 18, 2026
- TriAxialKVTriAxialKV: Toward Extreme Low-Precision KV-Cache Quantization for Agentic Inference TasksMay 16, 2026
- KVServeKVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM ServingMay 13, 2026
- WindowQuantWindowQuant: Mixed-Precision KV Cache Quantization based on Window-Level Similarity for VLMs Inference OptimizationMay 4, 2026
- Apr 21, 2026
- eOptShrinkQeOptShrinkQ: Near-Lossless KV Cache Compression Through Optimal Spectral Denoising and QuantizationApr 6, 2026
- Apr 3, 2026
- Mar 30, 2026
- Mar 29, 2026