CLJun 10, 2025

SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner

arXiv:2506.09003v212 citationsh-index: 12Has CodeICML
Originality Incremental advance
AI Analysis

This addresses the need for automated, verifiable data in software engineering, particularly for test-driven development tasks, though it is incremental as it builds on existing methods for data synthesis.

The paper tackles the problem of generating software engineering data by introducing SWE-Flow, a framework that synthesizes incremental development steps from unit tests using a Runtime Dependency Graph, resulting in a benchmark with 16,061 training instances and 2,020 test instances that improves TDD-based coding performance when used for fine-tuning.

We introduce **SWE-Flow**, a novel data synthesis framework grounded in Test-Driven Development (TDD). Unlike existing software engineering data that rely on human-submitted issues, **SWE-Flow** automatically infers incremental development steps directly from unit tests, which inherently encapsulate high-level requirements. The core of **SWE-Flow** is the construction of a Runtime Dependency Graph (RDG), which precisely captures function interactions, enabling the generation of a structured, step-by-step *development schedule*. At each step, **SWE-Flow** produces a partial codebase, the corresponding unit tests, and the necessary code modifications, resulting in fully verifiable TDD tasks. With this approach, we generated 16,061 training instances and 2,020 test instances from real-world GitHub projects, creating the **SWE-Flow-Eval** benchmark. Our experiments show that fine-tuning open model on this dataset significantly improves performance in TDD-based coding. To facilitate further research, we release all code, datasets, models, and Docker images at [Github](https://github.com/Hambaobao/SWE-Flow).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes