AdaKV (KV-cache compression): superseded — cited as a baseline and beaten by newer methods. 7 paper(s) critique it, 7 beat it on benchmarks — #8 of 234 most-superseded. Sub-problem: cluster led by SnapKV. Newer alternatives in the same sub-problem include STaR-KV, GRKV, MomentKV, NestedKV, IndexMem.

Superseded baseline#8 of 234 most-superseded

AdaKV

Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference

KV-cache compression · first seen Jul 16, 2024

superseded — cited as a baseline and beaten by newer methods

7 papers critique it · 7 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites AdaKV as a baseline.

“However they still quantize (if at all) uniformly.”
— MoE-nD: Per-Layer Mixture-of-Experts Routing for Multi-Axis KV Cache Compression
“However, these methods often rely on experimental observations and pre-define some rules for cache budget allocation and KV Cache eviction.”
— LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation
“proposed dynamic head-level allocation using attention scores but still relied on layer-level budgeting.”
— Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning
“AdaKV~adakv achieves theoretical optimality in allocation at the attention score level given a fixed budget, though this does not always translate to optimal end-to-end performance. These works make important contributions to the allocation problem, but they also exacerbate the fundamental challenge: how should the budget be determined in the first place?”
— Adaptive KV-Cache Compression without Manually Setting Budget
“adaptive methods like Ada-KV and D2O rely on calculating attention scores to allocate budgets or select tokens. This creates an inference-time circular dependency: identifying important components requires performing the heavy query-key interactions (O(t^2) complexity) that we aim to avoid”
— LKV: End-to-End Learning of Head-wise Budgets and Token Selection for LLM KV Cache Eviction
“existing methods evaluate attention head importance independently. For example, AdaKV evaluates the concentration degrees of heads while HeadKV assesses the retrieval-reasoning capability of each head in isolation as a measure of importance. However, these approaches treat heads as isolated units, overlooking the fact that their true importance emerges from their cooperation rather than individual capabilities.”
— CoKV: Optimizing KV Cache Allocation via Cooperative Game
“all eviction methods share the same post-eviction inference procedure: attention is renormalized exclusively over the retained KV pairs, and the evicted ones leave no trace in subsequent operations”
— MomentKV: Closing the Directional Gap in KV Cache Eviction for Long-Context Inference

Beaten on benchmarks

Head-to-head results where a newer method reports beating AdaKV. Values are copied from the source paper's tables — verify against the cited paper.

AudioKV beats AdaKV · Average accuracy (ZH+EN+FR+DE+ES ASR) [Qwen2.5-Omni-7B, retention=0.8]
93.1 vs 15.8
AudioKV: KV Cache Eviction in Efficient Large Audio Language Models
MixKV + AdaKV beats AdaKV · DocVQA (%) [LLaVA-NeXT-Mistral-7B, Budget 256]
61.3 vs 59.6
Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models
MixKV + AdaKV beats AdaKV · DocVQA (%) [LLaVA-NeXT-Mistral-7B, Budget 128]
58.3 vs 55.9
Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models
MixKV + AdaKV beats AdaKV · DocVQA (%) [LLaVA-NeXT-Mistral-7B, Budget 64]
50.8 vs 48.7
Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models
LaProx beats AdaKV · Avg. [Meta-Llama-3.1-8B-Instruct 128L$]
45.19 vs 43.12
Reformulating KV Cache Eviction Problem for Long-Context LLM Inference
LaProx beats AdaKV · Avg. [Meta-Llama-3.1-8B-Instruct 256L$]
47.22 vs 45.81
Reformulating KV Cache Eviction Problem for Long-Context LLM Inference
LaProx beats AdaKV · Avg. [Meta-Llama-3.1-8B-Instruct 512L$]
48.23 vs 47.71
Reformulating KV Cache Eviction Problem for Long-Context LLM Inference
LaProx beats AdaKV · Avg. [Mistral-7B-Instruct-v0.3 128L$]
44.00 vs 40.95
Reformulating KV Cache Eviction Problem for Long-Context LLM Inference
LaProx beats AdaKV · Avg. [Mistral-7B-Instruct-v0.3 256L$]
45.08 vs 44.22
Reformulating KV Cache Eviction Problem for Long-Context LLM Inference
LaProx beats AdaKV · Avg. [Mistral-7B-Instruct-v0.3 512L$]
46.74 vs 45.74
Reformulating KV Cache Eviction Problem for Long-Context LLM Inference
RDKV beats AdaKV · Avg. [B_total=64L]
45.97 vs 39.59
RDKV: Rate-Distortion Bit Allocation for Joint Eviction and Quantization of the KV Cache
RDKV beats AdaKV · Avg. [B_total=128L]
47.75 vs 43.64
RDKV: Rate-Distortion Bit Allocation for Joint Eviction and Quantization of the KV Cache

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.