Scaling Long-Horizon LLM Agent via Context-Folding
This addresses a fundamental bottleneck for LLM agents in complex tasks like Deep Research and SWE, though it appears incremental as it builds on existing context management ideas.
The paper tackles the problem of context length constraints in LLM agents for long-horizon tasks by introducing Context-Folding, a framework that allows agents to manage working context through procedural branching and folding, resulting in matching or outperforming ReAct baselines while using 10x smaller active context.
Large language model (LLM) agents are fundamentally constrained by context length on long-horizon tasks. We introduce Context-Folding, a framework that empowers agents to actively manage their working context. An agent can procedurally branch into a sub-trajectory to handle a subtask and then fold it upon completion, collapsing the intermediate steps while retaining a concise summary of the outcome. To make this behavior learnable, we develop an end-to-end reinforcement learning framework FoldGRPO with specific process rewards to encourage effective task decomposition and context management. On complex long-horizon tasks (Deep Research and SWE), our folding agent matches or outperforms the ReAct baselines while using an active context 10$\times$ smaller and significantly outperforms models that rely on summarization-based context management.