AI CL LGMar 24, 2025

Verbal Process Supervision Elicits Better Coding Agents

Hao-Yuan Chen, Cheng-Pong Huang, Jui-Ming Yao

arXiv:2503.18494v11 citationsh-index: 1

Originality Incremental advance

AI Analysis

This work addresses complex software engineering challenges for AI coding agents, representing an incremental step forward in integrating reasoning-driven architectures with LLM-based code generation.

The paper tackles the problem of large language models struggling with complex software engineering tasks by introducing CURA, a code understanding and reasoning agent system enhanced with verbal process supervision, which achieves a 3.65% improvement over baseline models on challenging benchmarks like BigCodeBench and attains state-of-the-art performance when paired with the o3-mini model.

The emergence of large language models and their applications as AI agents have significantly advanced state-of-the-art code generation benchmarks, transforming modern software engineering tasks. However, even with test-time computed reasoning models, these systems still struggle with complex software engineering challenges. This work introduces CURA, a code understanding and reasoning agent system enhanced with verbal process supervision (VPS), achieving a 3.65\% improvement over baseline models on challenging benchmarks like BigCodeBench. Furthermore, CURA, when paired with the o3-mini model and VPS techniques, attains state-of-the-art performance. This work represents a step forward in integrating reasoning-driven architectures with LLM-based code generation, enabling agentic reasoning for language models to solve complex software engineering tasks.

View on arXiv PDF

Similar