SE AIMar 1

RepoRepair: Leveraging Code Documentation for Repository-Level Automated Program Repair

Zhongqiang Pan, Chuanyi Li, Wenkang Zhong, Yi Feng, Bin Luo, Vincent Ng

arXiv:2603.01048v18.83 citationsh-index: 33

Originality Highly original

AI Analysis

This addresses a critical bottleneck in automated program repair for developers by enabling more effective handling of complex cross-file issues in software repositories.

The paper tackles the challenge of scaling automated program repair from isolated functions to full repositories by proposing RepoRepair, which leverages LLMs to generate hierarchical code documentation for fault localization and repair, achieving repair rates of 45.7% on SWE-bench Lite and 37.1% on SWE-bench Multimodal at low costs.

Automated program repair (APR) struggles to scale from isolated functions to full repositories, as it demands a global, task-aware understanding to locate necessary changes. Current methods, limited by context and reliant on shallow retrieval or costly agent iterations, falter on complex cross-file issues. To this end, we propose RepoRepair, a novel documentation-enhanced approach for repository-level fault localization and program repair. Our core insight is to leverage LLMs to generate hierarchical code documentation (from functions to files) for code repositories, creating structured semantic abstractions that enable LLMs to comprehend repository-level context and dependencies. Specifically, RepoRepair first employs a text-based LLM (e.g., DeepSeek-V3) to generate file/function-level code documentation for repositories, which serves as auxiliary knowledge to guide fault localization. Subsequently, based on the fault localization results and the issue description, a powerful LLM (e.g., Claude-4) attempts to repair the identified suspicious code snippets. Evaluated on SWE-bench Lite, RepoRepair achieves a 45.7% repair rate at a low cost of $0.44 per fix. On SWE-bench Multimodal, it delivers state-of-the-art performance with a 37.1% repair rate despite a higher cost of $0.56 per fix, demonstrating robust and cost-effective performance across diverse problem domains.

View on arXiv PDF

Similar