Method Drift›KV-cache compression
CaM
KV-cache compression
superseded — cited as a baseline and beaten by newer methods
6 papers critique it · 6 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites CaM as a baseline.
“these single-modal optimizations exhibit limited efficacy in MLLMs due to cross-modal distribution shifts and attention pattern divergence, failing to preserve modality-specific information fidelity.”
— FlowMM: Cross-Modal Information Flow Guided KV Cache Merging for Efficient Multimodal Context Inference“In contrast, CaM zhang2024cam adaptively merges evicted value states into others but does not merge the corresponding keys.”
— KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference“most merging methods rely on local heuristics, they often funnel many evicted tokens into a small set of span-boundary tokens. These boundary tokens therefore become the main information carriers, overloading their representations and making them prone to over-merging: excessive aggregation can blur or even erase their original semantics, thereby degrading overall performance.”
— GRKV: Global Regression for Training-Free KV Cache Compression in Long-Context LLMs“CaM merges the values of evicted tokens only probabilistically, with a non-negligible probability of discarding them and hence losing information.”
— WeightedKV: Attention Scores Weighted Key-Value Cache Merging for Large Language Models“a fundamental limitation of these approaches is their uniform treatment of keys and values during merging despite their distinct distributional characteristics.”
— Homogeneous Keys, Heterogeneous Values: Exploiting Local KV Cache Asymmetry for Long-Context LLMs“Notably, both Quest and CaM report results only on LLaMA2, without comparisons to other eviction methods, limiting their relevance to current frontier models.”
— CAOTE: KV Cache Selection for LLMs via Attention Output Error-Based Token Eviction
Beaten on benchmarks
Head-to-head results where a newer method reports beating CaM. Values are copied from the source paper's tables — verify against the cited paper.
- SemantiCache: Efficient KV Cache Compression via Semantic Chunking and Clustered Merging
SemantiCache beats CaM · Average score [Llama-3-8B, 20% cache budget]
30.01 vs 27.86
- SemantiCache: Efficient KV Cache Compression via Semantic Chunking and Clustered Merging
SemantiCache beats CaM · Average score [Mistral-7B, 20% cache budget]
39.68 vs 35.27
- SemantiCache: Efficient KV Cache Compression via Semantic Chunking and Clustered Merging
SemantiCache beats CaM · Accuracy [Mistral-7B, L=32k, cache budget 1024]
91.02 vs 86.52
- SemantiCache: Efficient KV Cache Compression via Semantic Chunking and Clustered Merging
SemantiCache beats CaM · TPOT (s) [Llama-3-8B, 32k context]
0.031 vs 0.039
- SemantiCache: Efficient KV Cache Compression via Semantic Chunking and Clustered Merging
SemantiCache beats CaM · Memory (GB) [Llama-3-8B, 32k context]
15.94 vs 17.03
- Meta-Soft: Leveraging Composable Meta-Tokens for Context-Preserving KV Cache Compression
Meta-Soft beats CaM · Avg [All context lengths]
75.72 vs 68.20
- KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference
KeepKV beats CaM · NrtvQA [Llama-2-7B]
17.32 vs 11.79
- KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference
KeepKV beats CaM · Qasper [Llama-2-7B]
7.48 vs 5.1
- KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference
KeepKV beats CaM · MF-en [Llama-2-7B]
22.2 vs 19.12
- KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference
KeepKV beats CaM · HotpotQA [Llama-2-7B]
8.51 vs 7.26
- KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference
KeepKV beats CaM · Musique [Llama-2-7B]
4.65 vs 3.64
- KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference
KeepKV beats CaM · TriviaQA [Llama-2-7B]
88.87 vs 87.31
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- STaR-KVSTaR-KV: Spatio-Temporal Adaptive Re-weighting for KV Cache Compression in GUI Vision-Language ModelsJun 1, 2026
- May 29, 2026
- May 28, 2026
- May 26, 2026
- May 25, 2026
- CONF-KVCONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLMMay 24, 2026
- May 21, 2026
- May 12, 2026
- Global Retention-Based KV EvictionMake Each Token Count: Towards Improving Long-Context Performance with KV Cache EvictionMay 10, 2026
- ReST-KVReST-KV: Robust KV Cache Eviction with Layer-wise Output Reconstruction and Spatial-Temporal SmoothingMay 9, 2026
- May 8, 2026
- fixed-contract diagnosticWhen Does Value-Aware KV Eviction Help? A Fixed-Contract Diagnostic for Non-Monotone Cache CompressionMay 7, 2026