Putnam 2025 Problems in Rocq using Opus 4.6 and Rocq-MCP

Guillaume Baudart, Marc Lelarge, Tristan Stérin, Jules Viennot

arXiv:2603.2040569.7h-index: 33

AI Analysis

This work addresses the challenge of automated theorem proving in competitive mathematics, representing an incremental advance by applying existing methods to new data.

The researchers tackled the problem of autonomously proving Putnam competition problems using an AI agent, achieving a result of 10 out of 12 problems solved with Claude Opus 4.6 and Rocq-MCP tools, consuming 1.9 billion tokens over 17.7 hours of active compute.

We report on an experiment in which Claude Opus~4.6, equipped with a suite of Model Context Protocol (MCP) tools for the Rocq proof assistant, autonomously proved 10 of 12 problems from the 2025 Putnam Mathematical Competition. The MCP tools, designed with Claude by analyzing logs from a prior experiment on miniF2F-Rocq, encode a "compile-first, interactive-fallback" strategy. Running on an isolated VM with no internet access, the agent deployed 141 subagents over 17.7 hours of active compute (51.6h wall-clock), consuming approximately 1.9 billion tokens. All proofs are publicly available.

View on arXiv PDF

Similar