Is PyramidKV superseded?

PyramidKV (KV-cache compression): heavily superseded — a standard baseline that newer methods routinely beat. 21 paper(s) critique it, 29 beat it on benchmarks — #4 of 234 most-superseded. Sub-problem: cluster led by SnapKV. Newer alternatives in the same sub-problem include STaR-KV, GRKV, MomentKV, NestedKV, IndexMem.

Method Drift›KV-cache compression

Heavily superseded#4 of 234 most-superseded

PyramidKV

PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

KV-cache compression · first seen Jun 4, 2024

heavily superseded — a standard baseline that newer methods routinely beat

21 papers critique it · 29 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites PyramidKV as a baseline.

“existing methods often set an attenuation coefficient to control the KV cache budget in each layer, thus ignoring that the cache budget actual needed in each layer do not necessarily exhibit a monotonically decreasing pattern.”
— EvolKV: Evolutionary KV Cache Compression for LLM Inference
“PyramidKV statically allocates KV cache in a monotonically decreasing manner, which is ineffective for all input queries.”
— VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration
“Figure fig:score_heatmap reveals significant differences between individual layers even within the conventionally grouped Shallow (Layer ID 0-9), Middle (10-19), and Deep (20-31) layers, thereby challenging existing three-part perspectives in PyramidKV”
— SurfaceLogicKV: Surface and Logic Attention Behaviors are All You Need for Robust KV Cache Compression
“While these methods differ in selecting tokens for KV cache retention, they generally apply a uniform budget size across layers, even though the optimal budget size may vary.”
— ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
“Hierarchical methods like PyramidKV zhang2024pyramidkv adapt by layer but lack generalizability.”
— DynamicKV: Task-Aware Adaptive KV Cache Compression for Long Context LLMs
“However they still quantize (if at all) uniformly.”
— MoE-nD: Per-Layer Mixture-of-Experts Routing for Multi-Axis KV Cache Compression
“However, these methods often rely on experimental observations and pre-define some rules for cache budget allocation and KV Cache eviction.”
— LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation
“These methods make binary keep/drop decisions per token; retains the importance idea but replaces the drop action with lower-precision transmission”
— SpectrumKV: Per-Token Mixed-Precision KV Cache Transfer for Prefill-Decode Disaggregated LLM Serving
“Our work shares the insight that uniform allocation is suboptimal, but differs in signal (dynamic pilot MSE vs. static attention patterns) and scope (we also allocate across KV heads within a layer).”
— KVSculpt: KV Cache Compression as Distillation
“Despite their success in reducing cache size, these methods predominantly rely on static importance scores, overlooking the dynamic, implicit relationships among tokens.”
— GraphKV: Breaking the Static Selection Paradigm with Graph-Based KV Cache Eviction
“Although more effective than uniform allocation, this approach relies on heuristic allocation rather than learned patterns.”
— KVCompose: Efficient Structured KV Cache Compression with Composite Tokens
“For Prefill-Only Compression, methods like SnapKV and PyramidKV, retaining all KV cache generated during the decoding phase, leading to linear cache growth with the output length and memory pressure.”
— SCOPE: Optimizing Key-Value Cache Compression in Long-context Generation

Beaten on benchmarks

Head-to-head results where a newer method reports beating PyramidKV. Values are copied from the source paper's tables — verify against the cited paper.

Ada-PyramidKV beats PyramidKV · Ave. Score [B=128]
42.96 vs 41.78
Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference
LAQ beats PyramidKV · Avg [Mistral-7B-v0.2-Instruct, KV Cache Size = 128]
39.29 vs 36.16
Lookahead Q-Cache: Achieving More Consistent KV Cache Eviction via Pseudo Query
LAQ beats PyramidKV · Avg [Mistral-7B-v0.2-Instruct, KV Cache Size = 256]
40.51 vs 38.63
Lookahead Q-Cache: Achieving More Consistent KV Cache Eviction via Pseudo Query
LAQ beats PyramidKV · Avg [Mistral-7B-v0.2-Instruct, KV Cache Size = 512]
41.40 vs 40.39
Lookahead Q-Cache: Achieving More Consistent KV Cache Eviction via Pseudo Query
EvolKV beats PyramidKV · Avg. [KV Size = 128]
36.64 vs 36.11
EvolKV: Evolutionary KV Cache Compression for LLM Inference
EvolKV beats PyramidKV · Avg. [KV Size = 256]
39.10 vs 38.70
EvolKV: Evolutionary KV Cache Compression for LLM Inference
EvolKV beats PyramidKV · Avg. [KV Size = 512]
41.32 vs 40.26
EvolKV: Evolutionary KV Cache Compression for LLM Inference
EvolKV beats PyramidKV · Avg. [KV Size = 1024]
41.72 vs 41.29
EvolKV: Evolutionary KV Cache Compression for LLM Inference
VL-Cache beats PyramidKV · CIDEr [Coco-Caption, LLaVA-Mistral-7B, 10% cache budget]
100.36 vs 66.41
VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration
VL-Cache beats PyramidKV · CIDEr [Coco-Caption, LLaVA-1.6-34B, 10% cache budget]
137.35 vs 116.91
VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration
VL-Cache beats PyramidKV · ANLS [DocVQA, LLaVA-Mistral-7B, 10% cache budget]
62 vs 60
VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration
VL-Cache beats PyramidKV · ANLS [DocVQA, LLaVA-1.6-34B, 10% cache budget]
84 vs 83
VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.