Method Drift›KV-cache compression
AdaKV
Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM InferenceKV-cache compression · first seen Jul 16, 2024
superseded — cited as a baseline and beaten by newer methods
7 papers critique it · 7 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites AdaKV as a baseline.
“However they still quantize (if at all) uniformly.”
— MoE-nD: Per-Layer Mixture-of-Experts Routing for Multi-Axis KV Cache Compression“However, these methods often rely on experimental observations and pre-define some rules for cache budget allocation and KV Cache eviction.”
— LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation“proposed dynamic head-level allocation using attention scores but still relied on layer-level budgeting.”
— Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning“AdaKV~adakv achieves theoretical optimality in allocation at the attention score level given a fixed budget, though this does not always translate to optimal end-to-end performance. These works make important contributions to the allocation problem, but they also exacerbate the fundamental challenge: how should the budget be determined in the first place?”
— Adaptive KV-Cache Compression without Manually Setting Budget“adaptive methods like Ada-KV and D2O rely on calculating attention scores to allocate budgets or select tokens. This creates an inference-time circular dependency: identifying important components requires performing the heavy query-key interactions (O(t^2) complexity) that we aim to avoid”
— LKV: End-to-End Learning of Head-wise Budgets and Token Selection for LLM KV Cache Eviction“existing methods evaluate attention head importance independently. For example, AdaKV evaluates the concentration degrees of heads while HeadKV assesses the retrieval-reasoning capability of each head in isolation as a measure of importance. However, these approaches treat heads as isolated units, overlooking the fact that their true importance emerges from their cooperation rather than individual capabilities.”
— CoKV: Optimizing KV Cache Allocation via Cooperative Game“all eviction methods share the same post-eviction inference procedure: attention is renormalized exclusively over the retained KV pairs, and the evicted ones leave no trace in subsequent operations”
— MomentKV: Closing the Directional Gap in KV Cache Eviction for Long-Context Inference
Beaten on benchmarks
Head-to-head results where a newer method reports beating AdaKV. Values are copied from the source paper's tables — verify against the cited paper.
- AudioKV: KV Cache Eviction in Efficient Large Audio Language Models
AudioKV beats AdaKV · Average accuracy (ZH+EN+FR+DE+ES ASR) [Qwen2.5-Omni-7B, retention=0.8]
93.1 vs 15.8
- Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models
MixKV + AdaKV beats AdaKV · DocVQA (%) [LLaVA-NeXT-Mistral-7B, Budget 256]
61.3 vs 59.6
- Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models
MixKV + AdaKV beats AdaKV · DocVQA (%) [LLaVA-NeXT-Mistral-7B, Budget 128]
58.3 vs 55.9
- Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models
MixKV + AdaKV beats AdaKV · DocVQA (%) [LLaVA-NeXT-Mistral-7B, Budget 64]
50.8 vs 48.7
- Reformulating KV Cache Eviction Problem for Long-Context LLM Inference
LaProx beats AdaKV · Avg. [Meta-Llama-3.1-8B-Instruct 128L$]
45.19 vs 43.12
- Reformulating KV Cache Eviction Problem for Long-Context LLM Inference
LaProx beats AdaKV · Avg. [Meta-Llama-3.1-8B-Instruct 256L$]
47.22 vs 45.81
- Reformulating KV Cache Eviction Problem for Long-Context LLM Inference
LaProx beats AdaKV · Avg. [Meta-Llama-3.1-8B-Instruct 512L$]
48.23 vs 47.71
- Reformulating KV Cache Eviction Problem for Long-Context LLM Inference
LaProx beats AdaKV · Avg. [Mistral-7B-Instruct-v0.3 128L$]
44.00 vs 40.95
- Reformulating KV Cache Eviction Problem for Long-Context LLM Inference
LaProx beats AdaKV · Avg. [Mistral-7B-Instruct-v0.3 256L$]
45.08 vs 44.22
- Reformulating KV Cache Eviction Problem for Long-Context LLM Inference
LaProx beats AdaKV · Avg. [Mistral-7B-Instruct-v0.3 512L$]
46.74 vs 45.74
- RDKV: Rate-Distortion Bit Allocation for Joint Eviction and Quantization of the KV Cache
RDKV beats AdaKV · Avg. [B_total=64L]
45.97 vs 39.59
- RDKV: Rate-Distortion Bit Allocation for Joint Eviction and Quantization of the KV Cache
RDKV beats AdaKV · Avg. [B_total=128L]
47.75 vs 43.64
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- STaR-KVSTaR-KV: Spatio-Temporal Adaptive Re-weighting for KV Cache Compression in GUI Vision-Language ModelsJun 1, 2026
- May 29, 2026
- May 28, 2026
- May 26, 2026
- May 25, 2026
- CONF-KVCONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLMMay 24, 2026
- May 21, 2026
- May 12, 2026
- Global Retention-Based KV EvictionMake Each Token Count: Towards Improving Long-Context Performance with KV Cache EvictionMay 10, 2026
- ReST-KVReST-KV: Robust KV Cache Eviction with Layer-wise Output Reconstruction and Spatial-Temporal SmoothingMay 9, 2026
- May 8, 2026
- fixed-contract diagnosticWhen Does Value-Aware KV Eviction Help? A Fixed-Contract Diagnostic for Non-Monotone Cache CompressionMay 7, 2026