CLMay 23

CP-Agent: A Calibrated Risk-Controlled Agent for Feedback-Driven Competitive Programming

arXiv:2605.2469345.8
AI Analysis

For LLM-based competitive programming agents, CP-Agent provides a principled, cost-efficient method to improve accuracy while controlling false admission risk.

CP-Agent introduces a calibrated risk-controlled framework for feedback-driven competitive programming, raising Pass@1 from 25.8% to 48.5% on LiveCodeBench Pro and improving Refine@5 by 11.0% on ICPC-Eval without parameter updates.

Large language models still struggle with contest-level programming, while many agentic remedies rely on massive inference-time sampling or expensive multi-stage post-training. We study when execution feedback reliably helps an LLM CP solver and which mechanisms govern the gains. We model feedback-driven solving as a calibrated stopped process and identify three quantities: false-admission risk, program-level evidence against bad programs, and the active-state success hazard. Under held-out trace calibration and selection from a pre-declared finite controller manifest, the resulting structural certificate lower-bounds the clean success probability before false admission. We instantiate mechanisms targeting these quantities as Dual-Granularity Verification, Test Augmentation, and Experience-Driven Self-Evolving, yielding CP-Agent. Without updating any parameters, CP-Agent raises Pass@1 from 25.8\% to 48.5\% on LiveCodeBench Pro and improves Refine@5 by 11.0\% on ICPC-Eval. Across three LLM backbones, CP-Agent lies on the cost--accuracy efficiency frontier, and ablations show that each component primarily affects its corresponding certificate quantity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes