SEMay 19, 2020

Do Code Review Measures Explain the Incidence of Post-Release Defects?

arXiv:2005.09217v116 citations
Originality Synthesis-oriented
AI Analysis

This research addresses software engineering practitioners by showing that code review measures may not directly predict post-release defects, but it is incremental as it replicates and extends prior work.

The study investigated whether code review measures can explain post-release defects, finding that models without code review predictors performed as well or better, and that prior defects, module size, and authorship had the strongest relationships to defects.

Aim: In contrast to studies of defects found during code review, we aim to clarify whether code reviews measures can explain the prevalence of post-release defects. Method: We replicate a study by McIntoshet. al that uses additive regression to model the relationship between defects and code reviews. To increase external validity, we apply the same methodology on a new software project. We discuss our findings with the first author of the original study, McIntosh. We then investigate how to reduce the impact of correlated predictors in the variable selection process and how to increase understanding of the inter-relationships among the predictors by employing Bayesian Network (BN) models. Context: As in the original study, we use the same measures authors obtained for Qt project in the original study. We mine data from version control and issue tracker of Google Chrome and operationalize measures that are close analogs to the large collection of code, process, and code review measures used in the replicated the study. Results: Both the data from the original study and the Chrome data showed high instability of the influence of code review measures on defects with the results being highly sensitive to variable selection procedure. Models without code review predictors had as good or better fit than those with review predictors. Replication, however, confirms with the bulk of prior work showing that prior defects, module size, and authorship have the strongest relationship to post-release defects. The application of BN models helped explain the observed instability by demonstrating that the review-related predictors do not affect post-release defects directly and showed indirect effects. For example, changes that have no review discussion tend to be associated with files that have had many prior defects which in turn increase the number of post-release defects.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes