SEOct 14, 2021

Identifying Similar Test Cases That Are Specified in Natural Language

arXiv:2110.07733v122 citations
Originality Incremental advance
AI Analysis

This addresses the high cost and manual effort in software testing for industries relying on natural language test cases, though it is incremental as it builds on existing text embedding and clustering techniques.

The paper tackles the problem of redundant test cases in software testing by proposing an unsupervised approach to identify similar test cases specified in natural language, achieving F-scores of 87.39% for clustering test steps and 83.47% for identifying similar test cases in an industrial evaluation.

Software testing is still a manual process in many industries, despite the recent improvements in automated testing techniques. As a result, test cases are often specified in natural language by different employees and many redundant test cases might exist in the test suite. This increases the (already high) cost of test execution. Manually identifying similar test cases is a time-consuming and error-prone task. Therefore, in this paper, we propose an unsupervised approach to identify similar test cases. Our approach uses a combination of text embedding, text similarity and clustering techniques to identify similar test cases. We evaluate five different text embedding techniques, two text similarity metrics, and two clustering techniques to cluster similar test steps and four techniques to identify similar test cases from the test step clusters. Through an evaluation in an industrial setting, we showed that our approach achieves a high performance to cluster test steps (an F-score of 87.39%) and identify similar test cases (an F-score of 83.47%). Furthermore, a validation with developers indicates several different practical usages of our approach (such as identifying redundant and legacy test cases), which help to reduce the testing manual effort and time.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes