Yixin Bian

h-index6
2papers

2 Papers

SEJan 8
Analyzing Message-Code Inconsistency in AI Coding Agent-Authored Pull Requests

Jingzhi Gong, Giovanni Pinna, Yixin Bian et al.

Pull request (PR) descriptions generated by AI coding agents are the primary channel for communicating code changes to human reviewers. However, the alignment between these messages and the actual changes remains unexplored, raising concerns about the trustworthiness of AI agents. To fill this gap, we analyzed 23,247 agentic PRs across five agents using PR message-code inconsistency (PR-MCI). We contributed 974 manually annotated PRs, found 406 PRs (1.7%) exhibited high PR-MCI, and identified eight PR-MCI types, revealing that "descriptions claim unimplemented changes" was the most common issue (45.4%). Statistical tests confirmed that high-MCI PRs had 51.7% lower acceptance rates (28.3% vs. 80.0%) and took 3.5 times longer to merge (55.8 vs. 16.0 hours). Our findings suggest that unreliable PR descriptions undermine trust in AI agents, highlighting the need for PR-MCI verification mechanisms and improved PR generation to enable trustworthy human-AI collaboration.

SEOct 5, 2025
GA4GC: Greener Agent for Greener Code via Multi-Objective Configuration Optimization

Jingzhi Gong, Yixin Bian, Luis de la Cal et al.

Coding agents powered by LLMs face critical sustainability and scalability challenges in industrial deployment, with single runs consuming over 100k tokens and incurring environmental costs that may exceed optimization benefits. This paper introduces GA4GC, the first framework to systematically optimize coding agent runtime (greener agent) and code performance (greener code) trade-offs by discovering Pareto-optimal agent hyperparameters and prompt templates. Evaluation on the SWE-Perf benchmark demonstrates up to 135x hypervolume improvement, reducing agent runtime by 37.7% while improving correctness. Our findings establish temperature as the most critical hyperparameter, and provide actionable strategies to balance agent sustainability with code optimization effectiveness in industrial deployment.