SEAINov 15, 2024

An Empirical Study on LLM-based Agents for Automated Bug Fixing

arXiv:2411.10213v236 citationsh-index: 8
Originality Synthesis-oriented
AI Analysis

This provides a systematic analysis for researchers and practitioners in software engineering and AI, but it is incremental as it builds on existing work without introducing new methods.

The paper empirically studied six LLM-based agent systems for automated bug fixing on the SWE-bench Verified benchmark, analyzing their performance variations, fault localization accuracy, and bug reproduction capabilities, concluding that both LLM capabilities and agent design need optimization for better effectiveness.

Large language models (LLMs) and LLM-based Agents have been applied to fix bugs automatically, demonstrating the capability in addressing software defects by engaging in development environment interaction, iterative validation and code modification. However, systematic analysis of these agent systems remain limited, particularly regarding performance variations among top-performing ones. In this paper, we examine six repair systems on the SWE-bench Verified benchmark for automated bug fixing. We first assess each system's overall performance, noting the instances solvable by all or none of these systems, and explore the capabilities of different systems. We also compare fault localization accuracy at file and code symbol levels and evaluate bug reproduction capabilities. Through analysis, we concluded that further optimization is needed in both the LLM capability itself and the design of Agentic flow to improve the effectiveness of the Agent in bug fixing.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes