Where's the Bug? Attention Probing for Scalable Fault Localization
This addresses the challenge of scalable fault localization for developers and LLM-based repair systems, offering a more efficient and accurate method compared to existing approaches.
The paper tackles the problem of fault localization in code using large language models by introducing Bug Attention Probe (BAP), which learns state-of-the-art fault localization without direct labels, achieving a 34.6% improvement in top-1 accuracy over the strongest baseline and 93.4% over zero-shot prompting GPT-4o across eight datasets.
Ensuring code correctness remains a challenging problem even as large language models (LLMs) become increasingly capable at code-related tasks. While LLM-based program repair systems can propose bug fixes using only a user's bug report, their effectiveness is fundamentally limited by their ability to perform fault localization (FL), a challenging problem for both humans and LLMs. Existing FL approaches rely on executable test cases, require training on costly and often noisy line-level annotations, or demand resource-intensive LLMs. In this paper, we present Bug Attention Probe (BAP), a method which learns state-of-the-art fault localization without any direct localization labels, outperforming traditional FL baselines and prompting of large-scale LLMs. We evaluate our approach across a variety of code settings, including real-world Java bugs from the standard Defects4J dataset as well as seven other datasets which span a diverse set of bug types and languages. Averaged across all eight datasets, BAP improves by 34.6% top-1 accuracy compared to the strongest baseline and 93.4% over zero-shot prompting GPT-4o. BAP is also significantly more efficient than prompting, outperforming large open-weight models at a small fraction of the computational cost.