SEApr 23

Generating Project-Specific Test Cases with Requirement Validation Intention

Binhang Qi, Yun Lin, Xinyi Weng, Yuhuan Huang, Chenyan Liu, Hailong Sun, Zhi Jin, Jin Song Dong

arXiv:2507.2061931.94 citationsh-index: 8

Predicted impact top 47% in SE · last 90 daysOriginality Incremental advance

AI Analysis

For software engineers, this improves automated test generation by aligning with practical validation needs, though it is an incremental improvement over retrieval-augmented LLM methods.

IntentionTest generates project-specific test cases from validation intention descriptions, achieving 28.1-37.6% more mutant killing and 23.7-49.0% more passing tests than baselines.

Test cases are valuable assets for maintaining software quality. State-of-the-art automated test generation techniques typically focus on maximizing program branch coverage or translating focal methods into test code. However, in contrast to branch coverage or code-to-test translation, practical tests are written out of the need to validate whether a requirement has been fulfilled. Specifically, each test usually reflects a developer's validation intention for a program function, regarding (1) what is the test scenario of a program function? and (2) what is expected behavior under such a scenario? Without taking such intention into account, generated tests are less likely to be adopted in practice. In this work, we propose IntentionTest, which generates project-specific tests given the description of validation intention. IntentionTest adopts a retrieval-and-edit manner. First, given a focal code and a description of validation intention consisting of a test objective with test precondition and expected results, IntentionTest retrieves a reusable test in the project as the test reference. Then, IntentionTest edits the test reference with an LLM regarding the validation intention toward the target test. We extensively evaluate IntentionTest against four baselines on 3,680 test cases. Compared to state-of-the-art baselines, IntentionTest can (1) generate tests far more semantically relevant to ground-truth tests by (i) killing 28.1% to 37.6% more common mutants and (ii) sharing 16.9% to 23.9% more common coverage; and (2) generate 23.7% to 49.0% more successful passing tests.

View on arXiv PDF

Similar