SEJan 18, 2018

A Large-Scale Empirical Comparison of Static and Dynamic Test Case Prioritization Techniques

arXiv:1801.05917v170 citations
Originality Incremental advance
AI Analysis

This work addresses a gap in software testing research by providing a comprehensive comparison for practitioners, though it is incremental as it builds on existing prioritization techniques.

The authors conducted the first large-scale empirical study comparing static and dynamic test case prioritization techniques on 30 Java programs, finding that static call-graph-based methods are most effective and efficient at the class level, while topic-model-based methods excel at the method level, and faults detected by static and dynamic techniques are largely dissimilar, with only 25-30% agreement in the top 10% of test cases.

The large body of existing research in Test Case Prioritization (TCP) techniques, can be broadly classified into two categories: dynamic techniques (that rely on run-time execution information) and static techniques (that operate directly on source and test code). Absent from this current body of work is a comprehensive study aimed at understanding and evaluating the static approaches and comparing them to dynamic approaches on a large set of projects. In this work, we perform the first extensive study aimed at empirically evaluating four static TCP techniques comparing them with state-of-research dynamic TCP techniques at different test-case granularities (e.g., method and class-level) in terms of effectiveness, efficiency and similarity of faults detected. This study was performed on 30 real-word Java programs encompassing 431 KLoC. In terms of effectiveness, we find that the static call-graph-based technique outperforms the other static techniques at test-class level, but the topic-model-based technique performs better at test-method level. In terms of efficiency, the static call-graph-based technique is also the most efficient when compared to other static techniques. When examining the similarity of faults detected for the four static techniques compared to the four dynamic ones, we find that on average, the faults uncovered by these two groups of techniques are quite dissimilar, with the top 10% of test cases agreeing on only 25% - 30% of detected faults. This prompts further research into the severity/importance of faults uncovered by these techniques, and into the potential for combining static and dynamic information for more effective approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes