TOVA (KV-cache compression): superseded — cited as a baseline and beaten by newer methods. 6 paper(s) critique it, 14 beat it on benchmarks — #7 of 234 most-superseded. Sub-problem: cluster led by SnapKV. Newer alternatives in the same sub-problem include STaR-KV, GRKV, MomentKV, NestedKV, IndexMem.

Superseded baseline#7 of 234 most-superseded

TOVA

Transformers are Multi-State RNNs

KV-cache compression · first seen Jan 11, 2024

superseded — cited as a baseline and beaten by newer methods

6 papers critique it · 14 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites TOVA as a baseline.

“these methods require access to the full attention matrix, making them incompatible with Flash Attention~flashattention and thus impractical for modern deployment scenarios”
— Expected Attention: KV Cache Compression by Estimating Attention from Future Queries Distribution
“They fix the budget of KV Cache in a finite level, but don't distinguish the differences between layers and between heads.”
— LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation
“These methods, however, often overlook the structure of key information distribution by naively evicting tokens across the entire sequence.”
— TreeKV: Smooth Key-Value Cache Compression with Tree Structures
“While effective, most methods either discard unused tokens too early or require full cache for scoring.”
— PiKV: KV Cache Management System for Mixture of Experts
“However, these methods rely primarily on attention weights and often overlook the contribution of value states in shaping the final model outputs.”
— OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference
“TOVA~oren2024tova retains attention sinks and a sliding window of recent tokens; a credential at relative depth 0.5 sits 2,000 tokens outside the window.”
— Transactional Attention: Semantic Sponsorship for KV-Cache Retention

Beaten on benchmarks

Head-to-head results where a newer method reports beating TOVA. Values are copied from the source paper's tables — verify against the cited paper.

EA (ours) beats TOVA · score [Qwen, Ruler 4K, 50% compression]
94.7 vs 77.6
Expected Attention: KV Cache Compression by Estimating Attention from Future Queries Distribution
EA (ours) beats TOVA · score [Gemma, Ruler 4K, 50% compression]
92.7 vs 76.5
Expected Attention: KV Cache Compression by Estimating Attention from Future Queries Distribution
EA (ours) beats TOVA · score [Qwen, Ruler 16K, 50% compression]
92.7 vs 76.2
Expected Attention: KV Cache Compression by Estimating Attention from Future Queries Distribution
EA (ours) beats TOVA · score [Gemma, Ruler 16K, 50% compression]
76.6 vs 62.5
Expected Attention: KV Cache Compression by Estimating Attention from Future Queries Distribution
Expected Attention beats TOVA · score [Qwen, Longbench, 25% compression]
50.25 vs 48.14
Expected Attention: KV Cache Compression by Estimating Attention from Future Queries Distribution
KVTC beats TOVA · LITM [Llama 3.1 8B]
99.3 vs 1.2
KV Cache Transform Coding for Compact Storage in LLM Inference
KVTC beats TOVA · LITM [MN-Minitron 8B]
99.3 vs 0.3
KV Cache Transform Coding for Compact Storage in LLM Inference
KVTC beats TOVA · LITM [Mistral NeMo 12B]
99.8 vs 8.7
KV Cache Transform Coding for Compact Storage in LLM Inference
AhaKV beats TOVA · Average [LLaMA3-8B-Inst]
41.63 vs 40.18
AhaKV: Adaptive Holistic Attention-Driven KV Cache Eviction for Efficient Inference of Large Language Models
AhaKV beats TOVA · Average [Qwen2-7B-Inst]
41.84 vs 37.99
AhaKV: Adaptive Holistic Attention-Driven KV Cache Eviction for Efficient Inference of Large Language Models
AhaKV beats TOVA · Average [LLAMA2-7B-Chat]
26.78 vs 24.66
AhaKV: Adaptive Holistic Attention-Driven KV Cache Eviction for Efficient Inference of Large Language Models
AhaKV beats TOVA · Average [Gemma-7B-Inst]
33.08 vs 30.80
AhaKV: Adaptive Holistic Attention-Driven KV Cache Eviction for Efficient Inference of Large Language Models

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.