Is PyramidKV superseded?

PyramidKV (Long-context / context-window extension): superseded — cited as a baseline and beaten by newer methods. 1 paper(s) critique it, 2 beat it on benchmarks — #23 of 53 most-superseded. Sub-problem: cluster led by StreamingLLM. Newer alternatives in the same sub-problem include BA-Att, CSAttention, TCA-Attention, Dynamic Hierarchical Sparse Attention (DHSA).

Method Drift›Long-context / context-window extension

Superseded baseline#23 of 53 most-superseded

PyramidKV

PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

Long-context / context-window extension · first seen Jun 4, 2024

superseded — cited as a baseline and beaten by newer methods

1 papers critique it · 2 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites PyramidKV as a baseline.

“While these methods differ in selecting tokens for KV cache retention, they generally apply a uniform budget size across layers, even though the optimal budget size may vary.”
— ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty

Beaten on benchmarks

Head-to-head results where a newer method reports beating PyramidKV. Values are copied from the source paper's tables — verify against the cited paper.

ZigZagKV beats PyramidKV · Avg. [KV Size = 128]
43.30 vs 43.16
ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
ZigZagKV beats PyramidKV · Attention loss [Mistral Budget 128]
2.447 vs 2.960
ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
ZigZagKV beats PyramidKV · Attention loss [Mistral Budget 256]
1.249 vs 1.592
ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
ZigZagKV beats PyramidKV · Attention loss [Mistral Budget 512]
0.611 vs 0.885
ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
ZigZagKV beats PyramidKV · Attention loss [LLaMA Budget 128]
1.504 vs 1.912
ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
ZigZagKV beats PyramidKV · Attention loss [LLaMA Budget 256]
0.637 vs 0.879
ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
ZigZagKV beats PyramidKV · Attention loss [LLaMA Budget 512]
0.226 vs 0.424
ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
ZigZagKV beats PyramidKV · Hidden state loss [Mistral Budget 128]
2.544 vs 2.977
ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
ZigZagKV beats PyramidKV · Hidden state loss [Mistral Budget 256]
1.495 vs 1.595
ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
ZigZagKV beats PyramidKV · Hidden state loss [Mistral Budget 512]
0.830 vs 0.870
ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
ZigZagKV beats PyramidKV · Hidden state loss [LLaMA Budget 128]
2.918 vs 3.256
ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
ZigZagKV beats PyramidKV · Hidden state loss [LLaMA Budget 256]
1.668 vs 1.755
ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.