SweRank+: Multilingual, Multi-Turn Code Ranking for Software Issue Localization
This addresses the challenge of maintaining large-scale, multilingual codebases for developers, though it is incremental as it builds on existing ranking approaches.
The paper tackles the problem of localizing software issues in multilingual codebases by mapping error descriptions to relevant functions, introducing SweRank+ which combines a cross-lingual ranking tool and an agentic search setup to achieve state-of-the-art performance on benchmarks across various languages.
Maintaining large-scale, multilingual codebases hinges on accurately localizing issues, which requires mapping natural-language error descriptions to the relevant functions that need to be modified. However, existing ranking approaches are often Python-centric and perform a single-pass search over the codebase. This work introduces SweRank+, a framework that couples SweRankMulti, a cross-lingual code ranking tool, with SweRankAgent, an agentic search setup, for iterative, multi-turn reasoning over the code repository. SweRankMulti comprises a code embedding retriever and a listwise LLM reranker, and is trained using a carefully curated large-scale issue localization dataset spanning multiple popular programming languages. SweRankAgent adopts an agentic search loop that moves beyond single-shot localization with a memory buffer to reason and accumulate relevant localization candidates over multiple turns. Our experiments on issue localization benchmarks spanning various languages demonstrate new state-of-the-art performance with SweRankMulti, while SweRankAgent further improves localization over single-pass ranking.