SEMay 4

These Aren't the Reviews You're Looking For How Humans Review AI-Generated Pull Requests

Kacper Duma, Patryk Wróblewski, Jagoda Bobińska, Julia Winiarska, Piotr Przymus

arXiv:2605.0227343.3

AI Analysis

For researchers and practitioners studying code review processes in AI-assisted software development, this paper highlights how review metrics may misrepresent human oversight when AI agents are involved.

The study analyzes code review interactions for AI-generated pull requests on GitHub, finding that most receive no review and are predominantly reviewed by AI agents rather than humans, unlike human-authored PRs. This reveals systematic differences in review activity structure in agentic workflows, challenging the interpretation of review metrics as human oversight indicators.

We analyze code review interactions for AI-generated pull requests (PRs) on GitHub using the AIDev dataset and compare them to human-authored PRs within the same repositories. We find that most AI-generated PRs receive no review and, when reviewed, are largely dominated by AI agents rather than humans. Human-authored PRs are more likely to receive human-only review and to attract direct human feedback. In contrast, reviews of AI-generated PRs more often take the form of automation-mediated interaction, with human involvement frequently expressed through agent steering rather than standalone evaluation. These results indicate systematic differences in how review activity is structured in agentic workflows and raise challenges for interpreting review metrics as indicators of human oversight in large-scale mining studies.

View on arXiv PDF

Similar