SEMay 24

Hamster: A Large-Scale Study and Characterization of Developer-Written Tests

arXiv:2509.2620460.91 citationsh-index: 10Has Code
Predicted impact top 36% in SE · last 90 daysOriginality Synthesis-oriented
AI Analysis

For researchers in automated test generation, this work reveals a fundamental mismatch between developer practices and ATG capabilities, highlighting the need for more realistic test generation.

This study characterizes 1.7 million developer-written Java tests, finding that most exhibit features (e.g., mocking, complex assertions) beyond current automated test generation tools, and identifies research directions to bridge this gap.

Automated test generation (ATG), which aims to reduce the cost of manual test suite development, has been investigated for decades and has produced countless techniques based on a variety of approaches: symbolic analysis, search-based, random and adaptive-random, learning-based, and, most recently, large-language-model-based approaches. However, despite this large body of research, there is still a gap in our understanding of the characteristics of developer-written tests and, consequently, our assessment of how well ATG techniques and tools can generate realistic and representative tests. To bridge this gap, we conducted an extensive empirical study of developer-written tests for Java applications, covering 1.7 million test cases from open-source repositories. Our study is the first of its kind to evaluate aspects of developer-written tests that are mostly neglected in the existing literature -- including test scope, test fixtures and assertions, types of inputs, and use of mocking -- and characterize tests accordingly. Based on this characterization, we then compare existing tests with those generated by two state-of-the-art ATG tools. Our results highlight that the vast majority of developer-written tests exhibit characteristics that are beyond the capabilities of current ATG tools. Finally, based on our findings, we identify promising research directions that can help develop more effective tool support for developer testing practices. We believe this work can set the stage for additional research and bring ATG tools closer to generating the types of tests developers write.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes