SEAIMar 21

SWE-Next: Scalable Real-World Software Engineering Tasks for Agents

arXiv:2603.2069192.02 citationsh-index: 9
Predicted impact top 7% in SE · last 90 daysOriginality Incremental advance
AI Analysis

This addresses the problem of inefficient data collection for software engineering agents, though it is incremental as it builds on existing methods for task generation.

The paper tackles the challenge of scaling executable software engineering data for training agents by introducing SWE-Next, a framework that mines real merged pull requests to create self-verifying task instances, resulting in a dataset of 2,308 instances from 102,582 candidate commits and showing improved downstream pass@1 with fewer training trajectories.

Executable software engineering data is valuable for training SWE agents, but scaling it remains difficult for two reasons: only a small fraction of real repository changes yield verifiable, high-signal task instances, and naively building repository-specific environments quickly becomes the dominant systems cost. We present SWE-Next, an execution-grounded framework for scalable SWE task and trajectory collection. On the data side, SWE-Next mines real merged pull requests, executes candidate base/merged commit pairs, and retains only those that produce strict test improvements without regressions, yielding self-verifying instances. It also applies strict submission gating so that collected trajectories remain evidence-driven rather than speculative. On the systems side, SWE-Next introduces reusable repo-quarter profiles, which reuse the same environment across nearby commits in time while keeping each task run separate and reproducible. Using only 30 hours and 639GB of environment storage, SWE-Next processes 3,971 seed repositories and 102,582 candidate commit pairs mined from real merged PRs to construct a dataset of 2,308 self-verifying instances. Experiments show that SWE-Next improves downstream pass@1 with fewer or comparable training trajectories, indicating that its gains come not from a stronger trajectory generator, but from higher-signal execution-grounded supervision and more efficient data collection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes