SE AIApr 4

Toward Executable Repository-Level Code Generation via Environment Alignment

Ruwei Pan, Junlei Shen, Linhao Wu, Yueheng Zhu, Zixiong Yang, Yakun Zhang, Lu Zhang, Hongyu Zhang

arXiv:2604.0362252.3h-index: 3

Predicted impact top 38% in SE · last 90 daysOriginality Incremental advance

AI Analysis

This addresses the problem of generating executable code repositories for developers, representing an incremental advance by improving performance on a specific bottleneck.

The paper tackles repository-level code generation, where existing methods struggle to produce executable multi-file repositories, and proposes EnvGraph, a framework that formulates executability as an environment alignment problem, achieving absolute improvements of 5.72–5.87 percentage points in Functional Correctness and 4.58–8.66 percentage points in Non-Functional Quality over baselines.

Large language models (LLMs) have achieved strong performance on code generation, but existing methods still struggle with repository-level code generation under executable validation. Under this evaluation setting, success is determined not by the plausibility of isolated code fragments, but by whether a generated multi-file repository can be successfully installed, have its dependencies and internal references resolved, be launched, and be validated in a real execution environment. To address this challenge, we propose EnvGraph, a framework for repository-level code generation that formulates repository executability as an environment alignment problem. EnvGraph jointly models two coupled conditions for successful repository execution, namely external dependency satisfaction and repository-internal reference resolution. It maintains a dual-layer environment representation, uses execution evidence to perform execution-evidence-based attribution, and guides repository generation through a unified targeted revision mechanism within an iterative alignment loop. We evaluate EnvGraph on repository-level code generation with three representative backbone LLMs and compare it against representative environment-aware and repository-level baselines. Experimental results show that EnvGraph consistently achieves the best performance on these repository-level benchmarks. In particular, it outperforms the strongest non-EnvGraph baseline by an absolute margin of 5.72--5.87 percentage points in Functional Correctness and 4.58--8.66 percentage points in Non-Functional Quality.

View on arXiv PDF

Similar