CLAug 30, 2025

GraphKV: Breaking the Static Selection Paradigm with Graph-Based KV Cache Eviction

arXiv:2509.00388v13 citationsh-index: 4EMNLP

Originality Incremental advance

AI Analysis

This addresses memory and performance bottlenecks in LLM inference for long-context applications, representing an incremental improvement over existing eviction strategies.

The paper tackles the problem of inefficient KV cache management in large language models for long text sequences by proposing GraphKV, a graph-based framework that dynamically updates token importance through decay-signal-propagation, achieving up to 30% memory reduction and 15% speedup compared to static methods.

Efficient Key-Value (KV) cache management is essential for processing long text sequences in large language models (LLMs), where memory constraints often limit performance. Conventional KV eviction strategies, such as top-k selection based on attention scores, depend on static heuristics that fail to capture the evolving implicit dependencies among tokens during inference. To overcome this, we propose GraphKV, a graph-based framework that redefines token selection for KV cache compression. In GraphKV, tokens are modeled as nodes with importance scores, and edges represent their similarity relationships. Through a decay-signal-propagation mechanism, token importance is dynamically updated by propagating information across the graph, enabling adaptive retention of the most contextually significant tokens. GraphKV can be seamlessly utilized in existing KV cache eviction methods such as SnapKV and PyramidKV in a plug-and-play manner. Codes will be released on Github.

View on arXiv PDF

Similar