SEAIFeb 9

AIDev: Studying AI Coding Agents on GitHub

arXiv:2602.09185v18 citations
Originality Synthesis-oriented
AI Analysis

This provides a foundational resource for studying AI adoption and human-AI collaboration in software engineering, though it is incremental as it focuses on data collection rather than new methods.

The authors tackled the lack of a comprehensive dataset on AI coding agents in real-world projects by introducing AIDev, a large-scale dataset of 932,791 agent-authored pull requests from GitHub, spanning 116,211 repositories and 72,189 developers.

AI coding agents are rapidly transforming software engineering by performing tasks such as feature development, debugging, and testing. Despite their growing impact, the research community lacks a comprehensive dataset capturing how these agents are used in real-world projects. To address this gap, we introduce AIDev, a large-scale dataset focused on agent-authored pull requests (Agentic-PRs) in real-world GitHub repositories. AIDev aggregates 932,791 Agentic-PRs produced by five agents: OpenAI Codex, Devin, GitHub Copilot, Cursor, and Claude Code. These PRs span 116,211 repositories and involve 72,189 developers. In addition, AIDev includes a curated subset of 33,596 Agentic-PRs from 2,807 repositories with over 100 stars, providing further information such as comments, reviews, commits, and related issues. This dataset offers a foundation for future research on AI adoption, developer productivity, and human-AI collaboration in the new era of software engineering. > AI Agent, Agentic AI, Coding Agent, Agentic Coding, Agentic Software Engineering, Agentic Engineering

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes