SENov 17, 2021

Are automated static analysis tools worth it? An investigation into relative warning density and external software quality

arXiv:2111.09188v2
Originality Synthesis-oriented
AI Analysis

This research addresses the practical value of ASATs for software developers by evaluating their impact on defect rates, but it is incremental as it builds on existing studies with specific tool and metric analyses.

The study investigated whether automated static analysis tools (ASATs) like PMD for Java have a measurable impact on external software quality by analyzing warning density relative to defects. It found that bug-inducing files had fewer warnings than other files, but this was due to overall decreasing warning density, and statistically significant differences in metrics were negligible in effect size.

Automated Static Analysis Tools (ASATs) are part of software development best practices. ASATs are able to warn developers about potential problems in the code. On the one hand, ASATs are based on best practices so there should be a noticeable effect on software quality. On the other hand, ASATs suffer from false positive warnings, which developers have to inspect and then ignore or mark as invalid. In this article, we ask the question if ASATs have a measurable impact on external software quality, using the example of PMD for Java. We investigate the relationship between ASAT warnings emitted by PMD on defects per change and per file. Our case study includes data for the history of each file as well as the differences between changed files and the project in which they are contained. We investigate whether files that induce a defect have more static analysis warnings than the rest of the project. Moreover, we investigate the impact of two different sets of ASAT rules. We find that, bug inducing files contain less static analysis warnings than other files of the project at that point in time. However, this can be explained by the overall decreasing warning density. When compared with all other changes, we find a statistically significant difference in one metric for all rules and two metrics for a subset of rules. However, the effect size is negligible in all cases, showing that the actual difference in warning density between bug inducing changes and other changes is small at best.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes