SEMar 14

Testing with AI Agents: An Empirical Study of Test Generation Frequency, Quality, and Coverage

arXiv:2603.1372410.9h-index: 10
Predicted impact top 58% in SE · last 90 daysOriginality Incremental advance
AI Analysis

This addresses the need for empirical insights into AI-generated testing practices for software developers, though it is incremental as it builds on existing datasets and tools.

This study tackled the problem of understanding how AI agent-based coding tools generate tests in real-world development by analyzing 2,232 commits, finding that AI authored 16.4% of test-adding commits and produced tests with distinct structural patterns and comparable code coverage to human-written ones.

Agent-based coding tools have transformed software development practices. Unlike prompt-based approaches that require developers to manually integrate generated code, these agent-based tools autonomously interact with repositories to create, modify, and execute code, including test generation. While many developers have adopted agent-based coding tools, little is known about how these tools generate tests in real-world development scenarios or how AI-generated tests compare to human-written ones. This study presents an empirical analysis of test generation by agent-based coding tools using the AIDev dataset. We extracted 2,232 commits containing test-related changes and investigated three aspects: the frequency of test additions, the structural characteristics of the generated tests, and their impact on code coverage. Our findings reveal that (i) AI authored 16.4% of all commits adding tests in real-world repositories, (ii) AI-generated test methods exhibit distinct structural patterns, featuring longer code and a higher density of assertions while maintaining lower cyclomatic complexity through linear logic, and (iii) AI-generated tests contribute to code coverage comparable to human-written tests, frequently achieving positive coverage gains across several projects.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes