Method Drift›KV-cache compression
Superseded baseline#27 of 234 most-superseded
MiniCache
MiniCache: KV Cache Compression in Depth Dimension for Large Language ModelsKV-cache compression · first seen May 23, 2024
superseded — cited as a baseline and beaten by newer methods
5 papers critique it · 2 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites MiniCache as a baseline.
“these single-modal optimizations exhibit limited efficacy in MLLMs due to cross-modal distribution shifts and attention pattern divergence, failing to preserve modality-specific information fidelity.”
— FlowMM: Cross-Modal Information Flow Guided KV Cache Merging for Efficient Multimodal Context Inference“the direct sharing method (e.g., MiniCache) suffers from a significant performance drop when the compression ratio exceeds 20%.”
— CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing“Our analysis, however, shows that such similarity, though present to some extent, is not consistently strong enough across layers to support robust compression, leading to nontrivial accuracy degradation in practice and limited compression rate”
— xKV: Cross-Layer SVD for KV-Cache Compression“their approaches are restricted to sharing in the layer or text segment within adjacent layers or the same LLM, limiting the broader applicability”
— SemShareKV: Efficient KVCache Sharing for Semantically Similar Prompts via Token-Level LSH Matching“Although effective in reducing memory usage, these methods risk degrading model accuracy.”
— OrbitFlow: SLO-Aware Long-Context LLM Serving with Fine-Grained KV Cache Reconfiguration
Beaten on benchmarks
Head-to-head results where a newer method reports beating MiniCache. Values are copied from the source paper's tables — verify against the cited paper.
- CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing
CommonKV beats MiniCache · Avg. [Llama3.1-8B-Instruct, ratio 0.3]
72.31 vs 18.90
- CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing
CommonKV beats MiniCache · Avg. [Mistral-v0.2-7B-Instruct, ratio 0.3]
71.84 vs 22.83
- xKV: Cross-Layer SVD for KV-Cache Compression
xKV (Ours) beats MiniCache · Avg. [Llama-3.1-8B-Instruct]
88.50 vs 45.04
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- May 22, 2026
- Apr 28, 2026
- Predictive Multi-Tier Memory ManagementPredictive Multi-Tier Memory Management for KV Cache in Large-Scale GPU InferenceApr 19, 2026
- TableCacheTableCache: Primary Foreign Key Guided KV Cache Precomputation for Low Latency Text-to-SQLJan 13, 2026
- Jan 5, 2026
- SemShareKVSemShareKV: Efficient KVCache Sharing for Semantically Similar Prompts via Token-Level LSH MatchingSep 29, 2025