Moving Beyond the Mean: Analyzing Variance in Software Engineering Experiments
This addresses a methodological gap in software engineering research by highlighting the importance of variance analysis, though it is incremental as it builds on established statistical concepts from other disciplines.
The paper tackles the problem of software engineering experiments relying solely on mean-based analyses, which can be misleading when variances differ across treatments, and demonstrates through simulation and a real industrial experiment on Test-Driven Development that considering variance can better inform technology adoption risks, showing that technologies with smaller variances may be more desirable for minimizing risk.
Software Engineering (SE) experiments are traditionally analyzed with statistical tests (e.g., t-tests, ANOVAs, etc.) that assume equally spread data across treatments (i.e., the homogeneity of variances assumption). Differences across treatments' variances in SE are not seen as an opportunity to gain insights on technology performance, but instead, as a hindrance to analyze the data. We have studied the role of variance in mature experimental disciplines such as medicine. We illustrate the extent to which variance may inform on technology performance by means of simulation. We analyze a real-life industrial experiment on Test-Driven Development (TDD) where variance may impact technology desirability. Evaluating the performance of technologies just based on means-as traditionally done in SE-may be misleading. Technologies that make developers resemble more to each other (i.e., technologies with smaller variances) may be more suitable if the aim is minimizing the risk of adopting them in real practice.