SE AI CL PLSep 29, 2025

AutoCode: LLMs as Problem Setters for Competitive Programming

Shang Zhou, Zihan Zheng, Kaiyuan Liu, Zeyu Shen, Zerui Cheng, Zexing Chen, Hansen He, Jianzhu Yao, Huanzhi Mao, Qiuyang Mang, Tianfu Fu, Beichen Li

arXiv:2510.12803v115.98 citationsh-index: 55

Originality Incremental advance

AI Analysis

This addresses the labor-intensive task of problem creation for competitive programming, offering a scalable solution for organizers and educators, though it is incremental in applying LLMs to a specific domain.

The paper tackled the challenge of generating competitive programming problems using large language models, achieving 99% consistency with official judgments on held-out problems, a significant improvement over the previous state-of-the-art at 81%. It also produced novel problems rated as contest quality by top-tier programmers.

Writing competitive programming problems is exacting. Authors must: set constraints, input distributions, and edge cases that rule out shortcuts; target specific algorithms (e.g., max-flow, dynamic programming, data structures); and calibrate complexity beyond the reach of most competitors. We argue that this makes for an ideal test of general large language model capabilities and study whether they can do this reliably. We introduce AutoCode, which uses multiple rounds of validation to yield competition-grade problem statements and test cases. On held-out problems, AutoCode test suites approach 99% consistency with official judgments, a significant improvement over current state-of-the-art methods like HardTests, which achieve less than 81%. Furthermore, starting with a random seed problem, AutoCode can create novel variants with reference and brute-force solutions. By cross-verifying these generated solutions against test cases, we can further filter out malformed problems. Our system ensures high correctness, as verified by human experts. AutoCode successfully produces novel problems judged by Grandmaster-level (top 0.3%) competitive programmers to be of contest quality.

View on arXiv PDF

Similar