AIApr 11

LoopGuard: Breaking Self-Reinforcing Attention Loops via Dynamic KV Cache Intervention

Dongjie Xu, Hao Wu, Weijie Shi, Yue Cui, Yuanjun Liu, Jiawei Li, Haolun Ma, An Liu, Jia Zhu, Jiajie Xu

arXiv:2604.1004475.11 citationsh-index: 12

AI Analysis

This work addresses a critical failure mode in long-context LLM generation, offering a practical solution to improve reliability and output quality.

The paper identifies a failure mode in long-context generation where decoding collapses into repetition loops due to collapsed attention patterns and KV cache reuse. It introduces LoopBench for benchmarking and LoopGuard, a KV cache intervention that reduces loop incidence by over 90 percentage points while improving output diversity.

Through systematic experiments on long-context generation, we observe a damaging failure mode in which decoding can collapse into persistent repetition loops. We find that this degeneration is driven by collapsed attention patterns, where a subset of heads locks onto a narrow suffix of the history, and is further stabilized by inference-time KV cache reuse. Crucially, since many existing KV cache policies rely on attention-based importance, this collapse can produce spuriously high scores for repetitive tokens, causing cache management to inadvertently amplify repetition. To study this phenomenon in a controlled and reproducible manner, we introduce LoopBench, a benchmark with explicit loop-inducing conditions and loop-oriented metrics that quantify repetition severity and generation instability beyond downstream task scores. Building on these insights, we propose LoopGuard, a lightweight, plug-in KV cache guard that detects loop onset online and disrupts the feedback cycle by pruning repetitive tail spans under a fixed cache budget. Experiments on LoopBench show that LoopGuard reduces loop incidence by over 90 percentage points, while restoring output diversity and reducing token waste.

View on arXiv PDF

Similar