CLAIMay 25

IndexMem: Learned KV-Cache Eviction with Latent Memory for Long-Context LLM Inference

arXiv:2605.2547591.9
Predicted impact top 27% in CL · last 90 daysOriginality Highly original
AI Analysis

For LLM practitioners, this addresses the memory bottleneck of long-context inference with a learned eviction policy that outperforms heuristic methods.

IndexMem introduces a learnable indexer for KV-cache eviction and a latent memory module to compress evicted tokens, enabling accurate long-context LLM inference under a bounded KV budget. It achieves up to 25-point improvement on RULER under aggressive eviction and superior LongBench scores.

Large Language Models (LLMs) are increasingly expected to operate over long contexts, yet standard softmax attention incurs a KV cache that grows linearly with sequence length, quickly becoming the bottleneck for long context inference. A practical remedy is to evict less important KV entries; however, existing eviction policies are largely heuristic and struggle to capture the rich, input-dependent distribution of token importance. In this work, we introduce a learnable indexer that predicts KV importance, enabling more accurate retention of critical tokens. Meanwhile, naively evicting tokens permanently discards their information, leading to irreversible forgetting and degraded retrieval over long ranges. To address this, we propose a lightweight latent memory module that compresses evicted tokens into a compact, online-updated state and provides residual readouts to compensate for the attention contributions lost through KV eviction. Collectively, our method enables accurate long-context inference under a bounded KV budget, delivering consistent improvements on RULER (4K/16K) across Qwen, Mistral, and Llama models (up to 25 points under aggressive eviction), markedly more stable Needle-in-a-Haystack retrieval, and superior LongBench scores and compression curves compared to existing eviction policies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes