CLDec 26, 2025

Context as a Tool: Context Management for Long-Horizon SWE-Agents

arXiv:2512.22087v116 citationsh-index: 12
Originality Highly original
AI Analysis

This addresses context management for software engineering agents, offering a novel approach to improve long-horizon interactions, though it is incremental in enhancing existing agent frameworks.

The paper tackles the problem of context explosion and degraded reasoning in long-horizon software engineering agents by proposing CAT, a proactive context management paradigm, resulting in a 57.6% solved rate on SWE-Bench-Verified, outperforming existing methods.

Agents based on large language models have recently shown strong potential on real-world software engineering (SWE) tasks that require long-horizon interaction with repository-scale codebases. However, most existing agents rely on append-only context maintenance or passively triggered compression heuristics, which often lead to context explosion, semantic drift, and degraded reasoning in long-running interactions. We propose CAT, a new context management paradigm that elevates context maintenance to a callable tool integrated into the decision-making process of agents. CAT formalizes a structured context workspace consisting of stable task semantics, condensed long-term memory, and high-fidelity short-term interactions, and enables agents to proactively compress historical trajectories into actionable summaries at appropriate milestones. To support context management for SWE-agents, we propose a trajectory-level supervision framework, CAT-GENERATOR, based on an offline data construction pipeline that injects context-management actions into complete interaction trajectories. Using this framework, we train a context-aware model, SWE-Compressor. Experiments on SWE-Bench-Verified demonstrate that SWE-Compressor reaches a 57.6% solved rate and significantly outperforms ReAct-based agents and static compression baselines, while maintaining stable and scalable long-horizon reasoning under a bounded context budget.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes