PatchRecall: Patch-Driven Retrieval for Automated Program Repair
For developers using APR systems, PatchRecall improves the effectiveness of automated repairs by retrieving more relevant files with less noise.
PatchRecall addresses the tradeoff between recall and conciseness in file retrieval for Automated Program Repair by combining codebase and history-based retrieval, achieving higher recall on SWE-Bench without significantly increasing retrieved file count.
Retrieving the correct set of files from a large codebase is a crucial step in Automated Program Repair (APR). High recall is necessary to ensure that the relevant files are included, but simply increasing the number of retrieved files introduces noise and degrades efficiency. To address this tradeoff, we propose PatchRecall, a hybrid retrieval approach that balances recall with conciseness. Our method combines two complementary strategies: (1) codebase retrieval, where the current issue description is matched against the codebase to surface potentially relevant files, and (2) history-based retrieval, where similar past issues are leveraged to identify edited files as candidate targets. Candidate files from both strategies are merged and reranked to produce the final retrieval set. Experiments on SWE-Bench demonstrate that PatchRecall achieves higher recall without significantly increasing retrieved file count, enabling more effective APR.