AIApr 11

LoopGuard: Breaking Self-Reinforcing Attention Loops via Dynamic KV Cache Intervention

arXiv:2604.1004475.11 citationsh-index: 12
AI Analysis

This work addresses a critical failure mode in long-context LLM generation, offering a practical solution to improve reliability and output quality.

The paper identifies a failure mode in long-context generation where decoding collapses into repetition loops due to collapsed attention patterns and KV cache reuse. It introduces LoopBench for benchmarking and LoopGuard, a KV cache intervention that reduces loop incidence by over 90 percentage points while improving output diversity.

Through systematic experiments on long-context generation, we observe a damaging failure mode in which decoding can collapse into persistent repetition loops. We find that this degeneration is driven by collapsed attention patterns, where a subset of heads locks onto a narrow suffix of the history, and is further stabilized by inference-time KV cache reuse. Crucially, since many existing KV cache policies rely on attention-based importance, this collapse can produce spuriously high scores for repetitive tokens, causing cache management to inadvertently amplify repetition. To study this phenomenon in a controlled and reproducible manner, we introduce LoopBench, a benchmark with explicit loop-inducing conditions and loop-oriented metrics that quantify repetition severity and generation instability beyond downstream task scores. Building on these insights, we propose LoopGuard, a lightweight, plug-in KV cache guard that detects loop onset online and disrupts the feedback cycle by pruning repetitive tail spans under a fixed cache budget. Experiments on LoopBench show that LoopGuard reduces loop incidence by over 90 percentage points, while restoring output diversity and reducing token waste.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes