Kosei Horikawa

6.1SEMar 14

Do AI Agents Really Improve Code Readability?

Kyogo Horikawa, Kosei Horikawa, Yutaro Kashiwa et al.

Code readability is fundamental to software quality and maintainability. Poor readability extends development time, increases bug-inducing risks, and contributes to technical debt. With the rapid advancement of Large Language Models, AI agent-based approaches have emerged as a promising paradigm for automated refactoring, capable of decomposing complex tasks through autonomous planning and execution. While prior studies have examined refactoring by AI agents, these analyses cover all forms of refactoring, including performance optimization and structural improvement. As a result, the extent to which AI agent-based refactoring specifically improves code readability remains unclear. This study investigates the impact of AI agent-based refactoring on code readability. We extracted commits containing readability-related keywords from the AIDev dataset and analyzed changes in readability metrics before and after each commit, covering 403 commits evaluated using multiple quantitative metrics. Our results indicate that AI agents primarily target logic complexity (42.4%) and documentation improvements (24.2%) rather than surface-level aspects like naming conventions or formatting. However, contrary to expectations, readability-focused commits often degraded traditional quality metrics: the Maintainability Index decreased in 56.1% of commits, while Cyclomatic Complexity increased in 42.7%.

10.9SEMar 14

Testing with AI Agents: An Empirical Study of Test Generation Frequency, Quality, and Coverage

Suzuka Yoshimoto, Shun Fujita, Kosei Horikawa et al.

Agent-based coding tools have transformed software development practices. Unlike prompt-based approaches that require developers to manually integrate generated code, these agent-based tools autonomously interact with repositories to create, modify, and execute code, including test generation. While many developers have adopted agent-based coding tools, little is known about how these tools generate tests in real-world development scenarios or how AI-generated tests compare to human-written ones. This study presents an empirical analysis of test generation by agent-based coding tools using the AIDev dataset. We extracted 2,232 commits containing test-related changes and investigated three aspects: the frequency of test additions, the structural characteristics of the generated tests, and their impact on code coverage. Our findings reveal that (i) AI authored 16.4% of all commits adding tests in real-world repositories, (ii) AI-generated test methods exhibit distinct structural patterns, featuring longer code and a higher density of assertions while maintaining lower cyclomatic complexity through linear logic, and (iii) AI-generated tests contribute to code coverage comparable to human-written tests, frequently achieving positive coverage gains across several projects.

Kosei Horikawa

2 Papers