Is MiniCache superseded?

MiniCache (KV-cache compression): superseded — cited as a baseline and beaten by newer methods. 5 paper(s) critique it, 2 beat it on benchmarks — #27 of 234 most-superseded. Sub-problem: cluster led by MiniCache. Newer alternatives in the same sub-problem include CachePrune, CacheFlow, Predictive Multi-Tier Memory Management, TableCache, OrbitFlow.

Method Drift›KV-cache compression

Superseded baseline#27 of 234 most-superseded

MiniCache

MiniCache: KV Cache Compression in Depth Dimension for Large Language Models

KV-cache compression · first seen May 23, 2024

superseded — cited as a baseline and beaten by newer methods

5 papers critique it · 2 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites MiniCache as a baseline.

“these single-modal optimizations exhibit limited efficacy in MLLMs due to cross-modal distribution shifts and attention pattern divergence, failing to preserve modality-specific information fidelity.”
— FlowMM: Cross-Modal Information Flow Guided KV Cache Merging for Efficient Multimodal Context Inference
“the direct sharing method (e.g., MiniCache) suffers from a significant performance drop when the compression ratio exceeds 20%.”
— CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing
“Our analysis, however, shows that such similarity, though present to some extent, is not consistently strong enough across layers to support robust compression, leading to nontrivial accuracy degradation in practice and limited compression rate”
— xKV: Cross-Layer SVD for KV-Cache Compression
“their approaches are restricted to sharing in the layer or text segment within adjacent layers or the same LLM, limiting the broader applicability”
— SemShareKV: Efficient KVCache Sharing for Semantically Similar Prompts via Token-Level LSH Matching
“Although effective in reducing memory usage, these methods risk degrading model accuracy.”
— OrbitFlow: SLO-Aware Long-Context LLM Serving with Fine-Grained KV Cache Reconfiguration

Beaten on benchmarks

Head-to-head results where a newer method reports beating MiniCache. Values are copied from the source paper's tables — verify against the cited paper.

CommonKV beats MiniCache · Avg. [Llama3.1-8B-Instruct, ratio 0.3]
72.31 vs 18.90
CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing
CommonKV beats MiniCache · Avg. [Mistral-v0.2-7B-Instruct, ratio 0.3]
71.84 vs 22.83
CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing
xKV (Ours) beats MiniCache · Avg. [Llama-3.1-8B-Instruct]
88.50 vs 45.04
xKV: Cross-Layer SVD for KV-Cache Compression

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.