AICLLGMar 24, 2025

Verbal Process Supervision Elicits Better Coding Agents

arXiv:2503.18494v11 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses complex software engineering challenges for AI coding agents, representing an incremental step forward in integrating reasoning-driven architectures with LLM-based code generation.

The paper tackles the problem of large language models struggling with complex software engineering tasks by introducing CURA, a code understanding and reasoning agent system enhanced with verbal process supervision, which achieves a 3.65% improvement over baseline models on challenging benchmarks like BigCodeBench and attains state-of-the-art performance when paired with the o3-mini model.

The emergence of large language models and their applications as AI agents have significantly advanced state-of-the-art code generation benchmarks, transforming modern software engineering tasks. However, even with test-time computed reasoning models, these systems still struggle with complex software engineering challenges. This work introduces CURA, a code understanding and reasoning agent system enhanced with verbal process supervision (VPS), achieving a 3.65\% improvement over baseline models on challenging benchmarks like BigCodeBench. Furthermore, CURA, when paired with the o3-mini model and VPS techniques, attains state-of-the-art performance. This work represents a step forward in integrating reasoning-driven architectures with LLM-based code generation, enabling agentic reasoning for language models to solve complex software engineering tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes