SEMay 24

How do Agents Refactor: An Empirical Study

Lukas Ottenhof, Daniel Penner, Abram Hindle, Thibaud Lutellier

arXiv:2601.2016038.12 citationsh-index: 41

AI Analysis

For researchers and practitioners using AI coding agents, this work reveals that current agents focus on superficial annotation changes rather than structural improvements, and may introduce code smells.

This study presents the first analysis of agentic refactoring pull requests in Java, comparing them to developer refactorings across 86 projects per group. Results show agent refactorings are dominated by annotation changes, while only Cursor shows a statistically significant increase in refactoring smells.

Software development agents such as Claude Code, GitHub Copilot, Cursor Agent, Devin, and OpenAI Codex are being increasingly integrated into developer workflows. While prior work has evaluated agent capabilities for code completion and task automation, there is little work investigating how these agents perform Java refactoring in practice, the types of changes they make, and their impact on code quality. In this study, we present the first analysis of agentic refactoring pull requests in Java, comparing them to developer refactorings across 86 projects per group. Using RefactoringMiner and DesigniteJava 3.0, we identify refactoring types and detect code smells before and after refactoring commits. Our results show that agent refactorings are dominated by annotation changes (the 5 most common refactoring types done by agents are annotation related), in contrast to the diverse structural improvements typical of developers. Despite these differences in refactoring types, we find Cursor to be the only model to show a statistically significant increase in refactoring smells.

View on arXiv PDF

Similar