Is CacheBlend superseded?

CacheBlend (KV-cache compression): superseded — cited as a baseline and beaten by newer methods. 3 paper(s) critique it, 2 beat it on benchmarks — #33 of 234 most-superseded. Sub-problem: cluster led by MiniCache. Newer alternatives in the same sub-problem include CachePrune, CacheFlow, Predictive Multi-Tier Memory Management, TableCache, OrbitFlow.

Method Drift›KV-cache compression

Superseded baseline#33 of 234 most-superseded

CacheBlend

KV-cache compression

superseded — cited as a baseline and beaten by newer methods

3 papers critique it · 2 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites CacheBlend as a baseline.

“Nonetheless, existing systems (, vLLM, CacheBlend, CacheCraft, Epic)~kwon2023efficient,yao2025cacheblend,agarwal2025cache,hu2024epic operate at coarse granularity and thus fundamentally cannot support selective sharing: they reuse KV cache at the level of fixed chunks (e.g., 512 tokens~agarwal2025cache) or entire prompt, so the presence of a single sensitive token (e.g., PII) invalidates the whole unit and discards most otherwise reusable content”
— CachePrune: Privacy-Aware and Fine-Grained KV Cache Sharing for Efficient LLM Inference
“Reuse strategies effective for agent-side generation can be decision-non-invariant for judges, revealing a failure mode overlooked by prior work.”
— When KV Cache Reuse Fails in Multi-Agent Systems: Cross-Candidate Interaction is Crucial for LLM Judges
“rely on exact context matching, which is unsuitable for real user scenarios”
— SemShareKV: Efficient KVCache Sharing for Semantically Similar Prompts via Token-Level LSH Matching

Beaten on benchmarks

Head-to-head results where a newer method reports beating CacheBlend. Values are copied from the source paper's tables — verify against the cited paper.

CachePrune beats CacheBlend · TTFT [WildChat]
134 vs 171
CachePrune: Privacy-Aware and Fine-Grained KV Cache Sharing for Efficient LLM Inference
CachePrune beats CacheBlend · TTFT [ShareGPT]
177 vs 210
CachePrune: Privacy-Aware and Fine-Grained KV Cache Sharing for Efficient LLM Inference
CachePrune beats CacheBlend · TTFT [LMSys]
74 vs 93
CachePrune: Privacy-Aware and Fine-Grained KV Cache Sharing for Efficient LLM Inference
KVShare beats CacheBlend · Accuracy [Recompute in Prefill & Decode Stage, Yi1.5-9B, GSM8K]
62.30 vs 43.55
KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse
KVShare beats CacheBlend · EM Score [Recompute in Prefill & Decode Stage, Qwen2.5-7B, DROP]
56.15 vs 46.09
KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse
KVShare beats CacheBlend · EM Score [Recompute in Prefill & Decode Stage, Llama3.1-8B, DROP]
68.75 vs 57.81
KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse
KVShare beats CacheBlend · EM Score [Recompute in Prefill & Decode Stage, Yi1.5-9B, DROP]
65.63 vs 60.15
KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse
KVShare beats CacheBlend · RougeL [Recompute in Prefill & Decode Stage, Qwen2.5-7B, SAMSum]
19.05 vs 16.20
KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse
KVShare beats CacheBlend · RougeL [Recompute in Prefill & Decode Stage, Llama3.1-8B, SAMSum]
17.17 vs 14.90
KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse
KVShare beats CacheBlend · RougeL [Recompute in Prefill & Decode Stage, Yi1.5-9B, SAMSum]
16.34 vs 14.47
KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse
KVShare beats CacheBlend · Accuracy [Recompute in Prefill Stage, Qwen2.5-7B, GSM8K]
63.28 vs 60.93
KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.