Method Drift›KV-cache compression
CacheBlend
KV-cache compression
superseded — cited as a baseline and beaten by newer methods
3 papers critique it · 2 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites CacheBlend as a baseline.
“Nonetheless, existing systems (, vLLM, CacheBlend, CacheCraft, Epic)~kwon2023efficient,yao2025cacheblend,agarwal2025cache,hu2024epic operate at coarse granularity and thus fundamentally cannot support selective sharing: they reuse KV cache at the level of fixed chunks (e.g., 512 tokens~agarwal2025cache) or entire prompt, so the presence of a single sensitive token (e.g., PII) invalidates the whole unit and discards most otherwise reusable content”
— CachePrune: Privacy-Aware and Fine-Grained KV Cache Sharing for Efficient LLM Inference“Reuse strategies effective for agent-side generation can be decision-non-invariant for judges, revealing a failure mode overlooked by prior work.”
— When KV Cache Reuse Fails in Multi-Agent Systems: Cross-Candidate Interaction is Crucial for LLM Judges“rely on exact context matching, which is unsuitable for real user scenarios”
— SemShareKV: Efficient KVCache Sharing for Semantically Similar Prompts via Token-Level LSH Matching
Beaten on benchmarks
Head-to-head results where a newer method reports beating CacheBlend. Values are copied from the source paper's tables — verify against the cited paper.
- CachePrune: Privacy-Aware and Fine-Grained KV Cache Sharing for Efficient LLM Inference
CachePrune beats CacheBlend · TTFT [WildChat]
134 vs 171
- CachePrune: Privacy-Aware and Fine-Grained KV Cache Sharing for Efficient LLM Inference
CachePrune beats CacheBlend · TTFT [ShareGPT]
177 vs 210
- CachePrune: Privacy-Aware and Fine-Grained KV Cache Sharing for Efficient LLM Inference
CachePrune beats CacheBlend · TTFT [LMSys]
74 vs 93
- KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse
KVShare beats CacheBlend · Accuracy [Recompute in Prefill & Decode Stage, Yi1.5-9B, GSM8K]
62.30 vs 43.55
- KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse
KVShare beats CacheBlend · EM Score [Recompute in Prefill & Decode Stage, Qwen2.5-7B, DROP]
56.15 vs 46.09
- KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse
KVShare beats CacheBlend · EM Score [Recompute in Prefill & Decode Stage, Llama3.1-8B, DROP]
68.75 vs 57.81
- KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse
KVShare beats CacheBlend · EM Score [Recompute in Prefill & Decode Stage, Yi1.5-9B, DROP]
65.63 vs 60.15
- KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse
KVShare beats CacheBlend · RougeL [Recompute in Prefill & Decode Stage, Qwen2.5-7B, SAMSum]
19.05 vs 16.20
- KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse
KVShare beats CacheBlend · RougeL [Recompute in Prefill & Decode Stage, Llama3.1-8B, SAMSum]
17.17 vs 14.90
- KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse
KVShare beats CacheBlend · RougeL [Recompute in Prefill & Decode Stage, Yi1.5-9B, SAMSum]
16.34 vs 14.47
- KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse
KVShare beats CacheBlend · Accuracy [Recompute in Prefill Stage, Qwen2.5-7B, GSM8K]
63.28 vs 60.93
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- May 22, 2026
- Apr 28, 2026
- Predictive Multi-Tier Memory ManagementPredictive Multi-Tier Memory Management for KV Cache in Large-Scale GPU InferenceApr 19, 2026
- TableCacheTableCache: Primary Foreign Key Guided KV Cache Precomputation for Low Latency Text-to-SQLJan 13, 2026
- Jan 5, 2026
- SemShareKVSemShareKV: Efficient KVCache Sharing for Semantically Similar Prompts via Token-Level LSH MatchingSep 29, 2025