SEFeb 5, 2021

Evaluating SZZ Implementations Through a Developer-informed Oracle

arXiv:2102.03300v155 citations
Originality Incremental advance
AI Analysis

This work provides a more reliable method for evaluating SZZ algorithm accuracy, which is crucial for researchers and practitioners who rely on SZZ for defect prediction and bug analysis, by addressing the limitation of researcher-created oracles.

The paper addresses the challenge of evaluating SZZ algorithm implementations by proposing a methodology to build a "developer-informed" oracle. This oracle, created using NLP to identify developer-referenced bug-inducing commits and manual filtering, was then used to evaluate several SZZ variants and distill lessons for improvement.

The SZZ algorithm for identifying bug-inducing changes has been widely used to evaluate defect prediction techniques and to empirically investigate when, how, and by whom bugs are introduced. Over the years, researchers have proposed several heuristics to improve the SZZ accuracy, providing various implementations of SZZ. However, fairly evaluating those implementations on a reliable oracle is an open problem: SZZ evaluations usually rely on (i) the manual analysis of the SZZ output to classify the identified bug-inducing commits as true or false positives; or (ii) a golden set linking bug-fixing and bug-inducing commits. In both cases, these manual evaluations are performed by researchers with limited knowledge of the studied subject systems. Ideally, there should be a golden set created by the original developers of the studied systems. We propose a methodology to build a "developer-informed" oracle for the evaluation of SZZ variants. We use Natural Language Processing (NLP) to identify bug-fixing commits in which developers explicitly reference the commit(s) that introduced a fixed bug. This was followed by a manual filtering step aimed at ensuring the quality and accuracy of the oracle. Once built, we used the oracle to evaluate several variants of the SZZ algorithm in terms of their accuracy. Our evaluation helped us to distill a set of lessons learned to further improve the SZZ algorithm.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes