SWE-Dev: Building Software Engineering Agents with Training and Inference Scaling
This work addresses the problem of automating software development for developers and researchers, but it is incremental as it builds upon existing LLM-powered toolkits.
The paper tackles the challenge of building effective software engineering agents by introducing SWE-Dev, which uses a pipeline to synthesize test cases and scale agent trajectories for training, achieving success rates of 23.4% and 36.6% on the SWE-bench-Verified benchmark.
Large language models (LLMs) have advanced rapidly from conversational problem solving to addressing real-world tasks involving tool use, such as software engineering (SWE). Recent LLM-powered toolkits, such as OpenAI Codex and Cursor, have offered end-to-end automation of the software development process. However, building effective SWE agents remains challenging due to the lack of high-quality training data and effective test cases. To address this issue, we present SWE-Dev, an SWE agent built upon open-source LLMs. First, we develop a robust pipeline to synthesize test cases for patch evaluation. Second, we scale up agent trajectories to construct the training data for building SWE-Dev. Experiments on the SWE-bench-Verified benchmark show that the SWE-Dev models can achieve top performance among all open SWE agents. Specifically, the success rates of the SWE-Dev 7B and 32B parameter models reach 23.4% and 36.6%, respectively, outperforming state-of-the-art open-source models. All code, models, and datasets are publicly available at https://github.com/THUDM/SWE-Dev.