AIApr 13

Escaping the Context Bottleneck: Active Context Curation for LLM Agents via Reinforcement Learning

arXiv:2604.1146299.62 citationsh-index: 16
AI Analysis

For LLM agents performing long-horizon tasks, this work introduces a scalable method to improve efficiency and performance by actively curating context.

LLMs suffer from context bottlenecks and lost-in-the-middle in long-horizon tasks. The authors propose a framework with a lightweight policy model (ContextCurator) that prunes noise via RL, improving success rate on WebArena from 36.4% to 41.2% with 8.8% fewer tokens, and on DeepSearch from 53.9% to 57.1% with 8x fewer tokens.

Large Language Models (LLMs) struggle with long-horizon tasks due to the "context bottleneck" and the "lost-in-the-middle" phenomenon, where accumulated noise from verbose environments degrades reasoning over multi-turn interactions. To address this issue, we introduce a symbiotic framework that decouples context management from task execution. Our architecture pairs a lightweight, specialized policy model, ContextCurator, with a powerful frozen foundation model, TaskExecutor. Trained via reinforcement learning, ContextCurator actively reduces information entropy in the working memory. It aggressively prunes environmental noise while preserving reasoning anchors, that is, sparse data points that are critical for future deductions. On WebArena, our framework improves the success rate of Gemini-3.0-flash from 36.4% to 41.2% while reducing token consumption by 8.8% (from 47.4K to 43.3K). On DeepSearch, it achieves a 57.1% success rate, compared with 53.9%, while reducing token consumption by a factor of 8. Remarkably, a 7B ContextCurator matches the context management performance of GPT-4o, providing a scalable and computationally efficient paradigm for autonomous long-horizon agents.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes