GEAR (KV-cache compression): superseded — cited as a baseline and beaten by newer methods. 2 paper(s) critique it, 2 beat it on benchmarks — #43 of 234 most-superseded. Sub-problem: cluster led by KIVI. Newer alternatives in the same sub-problem include SpectrumKV, Hurwitz Quaternion Multiplicative Quantization (HQMQ), OSCAR, OScaR, TriAxialKV.

Superseded baseline#43 of 234 most-superseded

GEAR

GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM

KV-cache compression · first seen Mar 8, 2024

superseded — cited as a baseline and beaten by newer methods

2 papers critique it · 2 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites GEAR as a baseline.

“At long context lengths, this overhead becomes non-negligible, limiting the overall compression benefits of KV cache quantization.”
— KVLinC : KV Cache Quantization with Hadamard Rotation and Linear Correction
“Despite its low accuracy drop, its need to solve small optimization problems for the low-rank decomposition leads to runtime overhead.”
— InnerQ: Hardware-aware Tuning-free Quantization of KV Cache for Large Language Models

Beaten on benchmarks

Head-to-head results where a newer method reports beating GEAR. Values are copied from the source paper's tables — verify against the cited paper.

KVTC beats GEAR · GSM8K [Llama 3.1 8B]
56.9 vs 52.8
KV Cache Transform Coding for Compact Storage in LLM Inference
KVTC beats GEAR · MMLU [Llama 3.1 8B]
60.1 vs 59.6
KV Cache Transform Coding for Compact Storage in LLM Inference
KVTC beats GEAR · GSM8K [MN-Minitron 8B]
60.3 vs 57.9
KV Cache Transform Coding for Compact Storage in LLM Inference
KVTC beats GEAR · MMLU [MN-Minitron 8B]
64.1 vs 63.6
KV Cache Transform Coding for Compact Storage in LLM Inference
KVTC beats GEAR · GSM8K [Mistral NeMo 12B]
62.0 vs 59.8
KV Cache Transform Coding for Compact Storage in LLM Inference
KVTC beats GEAR · MMLU [Mistral NeMo 12B]
64.4 vs 64.0
KV Cache Transform Coding for Compact Storage in LLM Inference
CSR beats GEAR · Average [Llama2-7B-Chat (4-bit comparable)]
37.58 vs 37.42
CSR:Achieving 1 Bit Key-Value Cache via Sparse Representation

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.