SEJun 26, 2018

How Do Static and Dynamic Test Case Prioritization Techniques Perform on Modern Software Systems? An Extensive Study on GitHub Projects

Qi Luo, Kevin Moran, Lingming Zhang, Denys Poshyvanyk

arXiv:1806.09774v111.948 citations

Originality Synthesis-oriented

AI Analysis

This work provides insights for software engineers using regression testing in agile environments, though it is incremental as it compares existing techniques on new data.

The study empirically evaluated static and dynamic test case prioritization techniques on 58 real-world Java programs, finding that static techniques can be surprisingly effective, particularly when measured by APFDc, and that performance tends to improve on larger programs without affecting comparative results.

Test Case Prioritization (TCP) is an increasingly important regression testing technique for reordering test cases according to a pre-defined goal, particularly as agile practices gain adoption. To better understand these techniques, we perform the first extensive study aimed at empirically evaluating four static TCP techniques, comparing them with state-of-research dynamic TCP techniques across several quality metrics. This study was performed on 58 real-word Java programs encompassing 714 KLoC and results in several notable observations. First, our results across two effectiveness metrics (the Average Percentage of Faults Detected APFD and the cost cognizant APFDc) illustrate that at test-class granularity, these metrics tend to correlate, but this correlation does not hold at test-method granularity. Second, our analysis shows that static techniques can be surprisingly effective, particularly when measured by APFDc. Third, we found that TCP techniques tend to perform better on larger programs, but that program size does not affect comparative performance measures between techniques. Fourth, software evolution does not significantly impact comparative performance results between TCP techniques. Fifth, neither the number nor type of mutants utilized dramatically impact measures of TCP effectiveness under typical experimental settings. Finally, our similarity analysis illustrates that highly prioritized test cases tend to uncover dissimilar faults.

View on arXiv PDF

Similar