CP-Agent: A Calibrated Risk-Controlled Agent for Feedback-Driven Competitive Programming
For LLM-based competitive programming agents, CP-Agent provides a principled, cost-efficient method to improve accuracy while controlling false admission risk.
CP-Agent introduces a calibrated risk-controlled framework for feedback-driven competitive programming, raising Pass@1 from 25.8% to 48.5% on LiveCodeBench Pro and improving Refine@5 by 11.0% on ICPC-Eval without parameter updates.
Large language models still struggle with contest-level programming, while many agentic remedies rely on massive inference-time sampling or expensive multi-stage post-training. We study when execution feedback reliably helps an LLM CP solver and which mechanisms govern the gains. We model feedback-driven solving as a calibrated stopped process and identify three quantities: false-admission risk, program-level evidence against bad programs, and the active-state success hazard. Under held-out trace calibration and selection from a pre-declared finite controller manifest, the resulting structural certificate lower-bounds the clean success probability before false admission. We instantiate mechanisms targeting these quantities as Dual-Granularity Verification, Test Augmentation, and Experience-Driven Self-Evolving, yielding CP-Agent. Without updating any parameters, CP-Agent raises Pass@1 from 25.8\% to 48.5\% on LiveCodeBench Pro and improves Refine@5 by 11.0\% on ICPC-Eval. Across three LLM backbones, CP-Agent lies on the cost--accuracy efficiency frontier, and ablations show that each component primarily affects its corresponding certificate quantity.