CLAIAug 14, 2025

SurfaceLogicKV: Surface and Logic Attention Behaviors are All You Need for Robust KV Cache Compression

arXiv:2508.15806v1
Originality Highly original
AI Analysis

This work addresses efficient inference for LLMs, offering a domain-specific incremental improvement in KV cache compression.

The paper tackles the challenge of KV cache storage pressure in LLMs due to increasing input sequence lengths by proposing SurfaceLogicKV, a two-stage compression method based on attention behaviors, achieving improved compression robustness and competitive performance across tasks and long sequences.

The increasing input sequence length in Large Language Models (LLMs) puts significant pressure on key-value (KV) cache storage, making efficient inference challenging. Explicitly distinguishing attention behavior into our self-defined surface memorization and logic construction reveals essential roles in long-context reasoning. We observe that an individual attention head can display various behaviors, with nearly 98.5% effectively ignoring completely irrelevant information. The remaining 1.5% behaves as logic construction, and 0.5% behaves as surface memorization. Based on layer- and head-wise integration, we propose a novel two-stage SurfaceLogicKV method to utilize these attention behaviors for KV Cache compression. As a result, it achieves improved compressing robustness while maintaining competitive performance across various tasks and long sequences compared to baselines or even FullKV in some specific situations

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes