Method Drift›KV-cache compression

Tracked

CachePrune

CachePrune: Privacy-Aware and Fine-Grained KV Cache Sharing for Efficient LLM Inference

KV-cache compression · first seen May 22, 2026

current frontier — recent, not yet superseded in the knowledge base

0 papers critique it · 0 beat it on benchmarks

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.

CachePrune CachePrune: Privacy-Aware and Fine-Grained KV Cache Sharing for Efficient LLM Inference
May 22, 2026
CacheFlow CacheFlow: Efficient LLM Serving with 3D-Parallel KV Cache Restoration
Apr 28, 2026
Predictive Multi-Tier Memory Management Predictive Multi-Tier Memory Management for KV Cache in Large-Scale GPU Inference
Apr 19, 2026
TableCache TableCache: Primary Foreign Key Guided KV Cache Precomputation for Low Latency Text-to-SQL
Jan 13, 2026
OrbitFlow OrbitFlow: SLO-Aware Long-Context LLM Serving with Fine-Grained KV Cache Reconfiguration
Jan 5, 2026
SemShareKV SemShareKV: Efficient KVCache Sharing for Semantically Similar Prompts via Token-Level LSH Matching
Sep 29, 2025