LG CLApr 22

LKV: End-to-End Learning of Head-wise Budgets and Token Selection for LLM KV Cache Eviction

Enshuai Zhou, Yifan Hao, Chao Wang, Rui Zhang, Di Huang, Jiaming Guo, Xing Hu, Zidong Du, Qi Guo, Yunji Chen

arXiv:2605.0667671.3

Predicted impact top 24% in LG · last 90 daysOriginality Highly original

AI Analysis

This work addresses the memory bottleneck in long-context LLM inference by replacing heuristic-based KV cache compression with a learned, task-optimized method, achieving state-of-the-art results.

LKV introduces an end-to-end differentiable approach for KV cache eviction in LLMs, learning both head-wise budgets and token importance to achieve near-lossless performance with only 15% KV cache retention on LongBench, outperforming heuristic methods.

Long-context inference in Large Language Models (LLMs) is bottlenecked by the linear growth of Key-Value (KV) cache memory. Existing KV cache compression paradigms are fundamentally limited by heuristics: heuristic budgeting relies on statistical priors rather than task objectives, causing resource misallocation, while heuristic selection relies on coupled query-key interactions or static inductive biases (e.g., attention sinks). To address this limitation, we introduce LKV (Learned KV Eviction), which formulates KV compression as an end-to-end differentiable optimization problem. LKV integrates LKV-H to learn task-optimized global budgets, and LKV-T to derive intrinsic KV importance without materializing attention matrices. This design bypasses heuristic proxies, strictly aligning compression with task objectives. Extensive evaluations demonstrate that LKV achieves state-of-the-art performance on both LongBench and RULER benchmarks at high compression rates. In particular, on LongBench, LKV achieves near-lossless performance with only 15\% KV cache retention. Crucially, our analysis identifies learned budgeting as the dominant driver of fidelity, demonstrating that data-driven allocation is essential to overcome the limitations of hand-crafted heuristics.

View on arXiv PDF

Similar