Method Drift›KV-cache compression
PyramidKV
PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information FunnelingKV-cache compression · first seen Jun 4, 2024
heavily superseded — a standard baseline that newer methods routinely beat
21 papers critique it · 29 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites PyramidKV as a baseline.
“existing methods often set an attenuation coefficient to control the KV cache budget in each layer, thus ignoring that the cache budget actual needed in each layer do not necessarily exhibit a monotonically decreasing pattern.”
— EvolKV: Evolutionary KV Cache Compression for LLM Inference“PyramidKV statically allocates KV cache in a monotonically decreasing manner, which is ineffective for all input queries.”
— VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration“Figure fig:score_heatmap reveals significant differences between individual layers even within the conventionally grouped Shallow (Layer ID 0-9), Middle (10-19), and Deep (20-31) layers, thereby challenging existing three-part perspectives in PyramidKV”
— SurfaceLogicKV: Surface and Logic Attention Behaviors are All You Need for Robust KV Cache Compression“While these methods differ in selecting tokens for KV cache retention, they generally apply a uniform budget size across layers, even though the optimal budget size may vary.”
— ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty“Hierarchical methods like PyramidKV zhang2024pyramidkv adapt by layer but lack generalizability.”
— DynamicKV: Task-Aware Adaptive KV Cache Compression for Long Context LLMs“However they still quantize (if at all) uniformly.”
— MoE-nD: Per-Layer Mixture-of-Experts Routing for Multi-Axis KV Cache Compression“However, these methods often rely on experimental observations and pre-define some rules for cache budget allocation and KV Cache eviction.”
— LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation“These methods make binary keep/drop decisions per token; retains the importance idea but replaces the drop action with lower-precision transmission”
— SpectrumKV: Per-Token Mixed-Precision KV Cache Transfer for Prefill-Decode Disaggregated LLM Serving“Our work shares the insight that uniform allocation is suboptimal, but differs in signal (dynamic pilot MSE vs. static attention patterns) and scope (we also allocate across KV heads within a layer).”
— KVSculpt: KV Cache Compression as Distillation“Despite their success in reducing cache size, these methods predominantly rely on static importance scores, overlooking the dynamic, implicit relationships among tokens.”
— GraphKV: Breaking the Static Selection Paradigm with Graph-Based KV Cache Eviction“Although more effective than uniform allocation, this approach relies on heuristic allocation rather than learned patterns.”
— KVCompose: Efficient Structured KV Cache Compression with Composite Tokens“For Prefill-Only Compression, methods like SnapKV and PyramidKV, retaining all KV cache generated during the decoding phase, leading to linear cache growth with the output length and memory pressure.”
— SCOPE: Optimizing Key-Value Cache Compression in Long-context Generation
Beaten on benchmarks
Head-to-head results where a newer method reports beating PyramidKV. Values are copied from the source paper's tables — verify against the cited paper.
- Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference
Ada-PyramidKV beats PyramidKV · Ave. Score [B=128]
42.96 vs 41.78
- Lookahead Q-Cache: Achieving More Consistent KV Cache Eviction via Pseudo Query
LAQ beats PyramidKV · Avg [Mistral-7B-v0.2-Instruct, KV Cache Size = 128]
39.29 vs 36.16
- Lookahead Q-Cache: Achieving More Consistent KV Cache Eviction via Pseudo Query
LAQ beats PyramidKV · Avg [Mistral-7B-v0.2-Instruct, KV Cache Size = 256]
40.51 vs 38.63
- Lookahead Q-Cache: Achieving More Consistent KV Cache Eviction via Pseudo Query
LAQ beats PyramidKV · Avg [Mistral-7B-v0.2-Instruct, KV Cache Size = 512]
41.40 vs 40.39
- EvolKV: Evolutionary KV Cache Compression for LLM Inference
EvolKV beats PyramidKV · Avg. [KV Size = 128]
36.64 vs 36.11
- EvolKV: Evolutionary KV Cache Compression for LLM Inference
EvolKV beats PyramidKV · Avg. [KV Size = 256]
39.10 vs 38.70
- EvolKV: Evolutionary KV Cache Compression for LLM Inference
EvolKV beats PyramidKV · Avg. [KV Size = 512]
41.32 vs 40.26
- EvolKV: Evolutionary KV Cache Compression for LLM Inference
EvolKV beats PyramidKV · Avg. [KV Size = 1024]
41.72 vs 41.29
- VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration
VL-Cache beats PyramidKV · CIDEr [Coco-Caption, LLaVA-Mistral-7B, 10% cache budget]
100.36 vs 66.41
- VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration
VL-Cache beats PyramidKV · CIDEr [Coco-Caption, LLaVA-1.6-34B, 10% cache budget]
137.35 vs 116.91
- VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration
VL-Cache beats PyramidKV · ANLS [DocVQA, LLaVA-Mistral-7B, 10% cache budget]
62 vs 60
- VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration
VL-Cache beats PyramidKV · ANLS [DocVQA, LLaVA-1.6-34B, 10% cache budget]
84 vs 83
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- STaR-KVSTaR-KV: Spatio-Temporal Adaptive Re-weighting for KV Cache Compression in GUI Vision-Language ModelsJun 1, 2026
- May 29, 2026
- May 28, 2026
- May 26, 2026
- May 25, 2026
- CONF-KVCONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLMMay 24, 2026
- May 21, 2026
- May 12, 2026
- Global Retention-Based KV EvictionMake Each Token Count: Towards Improving Long-Context Performance with KV Cache EvictionMay 10, 2026
- ReST-KVReST-KV: Robust KV Cache Eviction with Layer-wise Output Reconstruction and Spatial-Temporal SmoothingMay 9, 2026
- May 8, 2026
- fixed-contract diagnosticWhen Does Value-Aware KV Eviction Help? A Fixed-Contract Diagnostic for Non-Monotone Cache CompressionMay 7, 2026