SELGOct 3, 2023

Challenging Bug Prediction and Repair Models with Synthetic Bugs

arXiv:2310.02407v35 citationsh-index: 15
AI Analysis

This addresses the need for realistic bug datasets to test software engineering tools, though it is incremental as it builds on existing bug generation methods.

The paper tackles the challenge of evaluating bug prediction and repair models by generating synthetic bugs that are both hard to detect and hard to repair, using BugFarm to create 435k+ bugs from over 1.9M mutants, which outperformed alternative approaches in making bugs difficult for learning-based techniques.

Bugs are essential in software engineering; many research studies in the past decades have been proposed to detect, localize, and repair bugs in software systems. Effectiveness evaluation of such techniques requires complex bugs, i.e., those that are hard to detect through testing and hard to repair through debugging. From the classic software engineering point of view, a hard-to-repair bug differs from the correct code in multiple locations, making it hard to localize and repair. Hard-to-detect bugs, on the other hand, manifest themselves under specific test inputs and reachability conditions. These two objectives, i.e., generating hard-to-detect and hard-to-repair bugs, are mostly aligned; a bug generation technique can change multiple statements to be covered only under a specific set of inputs. However, these two objectives are conflicting for learning-based techniques: A bug should have a similar code representation to the correct code in the training data to challenge a bug prediction model to distinguish them. The hard-to-repair bug definition remains the same but with a caveat: the more a bug differs from the original code, the more distant their representations are and easier to be detected. We propose BugFarm, to transform arbitrary code into multiple complex bugs. BugFarm leverages LLMs to mutate code in multiple locations (hard-to-repair). To ensure that multiple modifications do not notably change the code representation, BugFarm analyzes the attention of the underlying model and instructs LLMs to only change the least attended locations (hard-to-detect). Our comprehensive evaluation of 435k+ bugs from over 1.9M mutants generated by BUGFARM and two alternative approaches demonstrates our superiority in generating bugs that are hard to detect by learning-based bug prediction approaches and hard-to-repair by state-of-the-art learning-based program repair technique.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes