SENov 9, 2021

Test cases as a measurement instrument in experimentation

arXiv:2111.05287v22 citations
Originality Synthesis-oriented
AI Analysis

This highlights a reproducibility issue in software engineering experiments, showing that incremental differences in test case construction can significantly alter experimental outcomes.

The study investigated how different test suites affect measurement accuracy in software engineering experiments, finding that response variable values can vary by up to ±60% depending on the test suite used.

Background: Test suites are frequently used to quantify relevant software attributes, such as quality or productivity. Problem: We have detected that the same response variable, measured using different test suites, yields different experiment results. Aims: Assess to which extent differences in test case construction influence measurement accuracy and experimental outcomes. Method: Two industry experiments have been measured using two different test suites, one generated using an ad-hoc method and another using equivalence partitioning. The accuracy of the measures has been studied using standard procedures, such as ISO 5725, Bland-Altman and Interclass Correlation Coefficients. Results: There are differences in the values of the response variables up to +-60%, depending on the test suite (ad-hoc vs. equivalence partitioning) used. Conclusions: The disclosure of datasets and analysis code is insufficient to ensure the reproducibility of SE experiments. Experimenters should disclose all experimental materials needed to perform independent measurement and re-analysis.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes