LGCLApr 22

LKV: End-to-End Learning of Head-wise Budgets and Token Selection for LLM KV Cache Eviction

arXiv:2605.0667671.3
Predicted impact top 24% in LG · last 90 daysOriginality Highly original
AI Analysis

This work addresses the memory bottleneck in long-context LLM inference by replacing heuristic-based KV cache compression with a learned, task-optimized method, achieving state-of-the-art results.

LKV introduces an end-to-end differentiable approach for KV cache eviction in LLMs, learning both head-wise budgets and token importance to achieve near-lossless performance with only 15% KV cache retention on LongBench, outperforming heuristic methods.

Long-context inference in Large Language Models (LLMs) is bottlenecked by the linear growth of Key-Value (KV) cache memory. Existing KV cache compression paradigms are fundamentally limited by heuristics: heuristic budgeting relies on statistical priors rather than task objectives, causing resource misallocation, while heuristic selection relies on coupled query-key interactions or static inductive biases (e.g., attention sinks). To address this limitation, we introduce LKV (Learned KV Eviction), which formulates KV compression as an end-to-end differentiable optimization problem. LKV integrates LKV-H to learn task-optimized global budgets, and LKV-T to derive intrinsic KV importance without materializing attention matrices. This design bypasses heuristic proxies, strictly aligning compression with task objectives. Extensive evaluations demonstrate that LKV achieves state-of-the-art performance on both LongBench and RULER benchmarks at high compression rates. In particular, on LongBench, LKV achieves near-lossless performance with only 15\% KV cache retention. Crucially, our analysis identifies learned budgeting as the dominant driver of fidelity, demonstrating that data-driven allocation is essential to overcome the limitations of hand-crafted heuristics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes