Method Drift›Long-context / context-window extension
PyramidKV
PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information FunnelingLong-context / context-window extension · first seen Jun 4, 2024
superseded — cited as a baseline and beaten by newer methods
1 papers critique it · 2 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites PyramidKV as a baseline.
“While these methods differ in selecting tokens for KV cache retention, they generally apply a uniform budget size across layers, even though the optimal budget size may vary.”
— ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
Beaten on benchmarks
Head-to-head results where a newer method reports beating PyramidKV. Values are copied from the source paper's tables — verify against the cited paper.
- ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
ZigZagKV beats PyramidKV · Avg. [KV Size = 128]
43.30 vs 43.16
- ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
ZigZagKV beats PyramidKV · Attention loss [Mistral Budget 128]
2.447 vs 2.960
- ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
ZigZagKV beats PyramidKV · Attention loss [Mistral Budget 256]
1.249 vs 1.592
- ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
ZigZagKV beats PyramidKV · Attention loss [Mistral Budget 512]
0.611 vs 0.885
- ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
ZigZagKV beats PyramidKV · Attention loss [LLaMA Budget 128]
1.504 vs 1.912
- ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
ZigZagKV beats PyramidKV · Attention loss [LLaMA Budget 256]
0.637 vs 0.879
- ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
ZigZagKV beats PyramidKV · Attention loss [LLaMA Budget 512]
0.226 vs 0.424
- ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
ZigZagKV beats PyramidKV · Hidden state loss [Mistral Budget 128]
2.544 vs 2.977
- ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
ZigZagKV beats PyramidKV · Hidden state loss [Mistral Budget 256]
1.495 vs 1.595
- ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
ZigZagKV beats PyramidKV · Hidden state loss [Mistral Budget 512]
0.830 vs 0.870
- ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
ZigZagKV beats PyramidKV · Hidden state loss [LLaMA Budget 128]
2.918 vs 3.256
- ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
ZigZagKV beats PyramidKV · Hidden state loss [LLaMA Budget 256]
1.668 vs 1.755
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.