SEMay 7

Comparing AI Coding Agents: A Task-Stratified Analysis of Pull Request Acceptance

arXiv:2602.0891547.73 citationsh-index: 3
AI Analysis

For software teams choosing AI coding assistants, this study provides a task-stratified comparison showing that agent choice matters less than task type, but offers no actionable guidance beyond confirming known variability.

This paper compares five AI coding agents across 7,156 pull requests, finding that task type dominates acceptance rates (documentation 82.1% vs. new features 66.1%) and that no single agent excels universally, with Devin showing a unique positive trend over time.

The rapid adoption of AI-powered coding assistants is transforming software development practices, yet systematic comparisons of their effectiveness across different task types and over time remain limited. This paper presents an empirical study comparing five popular agents (OpenAI Codex, GitHub Copilot, Devin, Cursor, and Claude Code), analyzing 7,156 pull requests (PRs) from the AIDev dataset. Temporal trend analysis reveals heterogeneous evolution patterns: Devin exhibits the only consistent positive trend in acceptance rate (+0.77% per week over 32 weeks), whereas other agents remain largely stable. Our analysis suggests that the PR task type is a dominant factor influencing acceptance rates: documentation tasks achieve 82.1% acceptance compared to 66.1% for new features - a 16 percentage point gap that exceeds typical inter-agent variance for most tasks. OpenAI Codex achieves consistently high acceptance rates across all nine task categories (59.6%-88.6%), with stratified Chi-square tests confirming statistically significant advantages over other agents in several task categories. However, no single agent performs best across all task types: Claude Code leads in documentation (92.3%) and features (72.6%), while Cursor excels in fix tasks (80.4%).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes