Tristan StÃ©rin

69.7LGMar 20

Putnam 2025 Problems in Rocq using Opus 4.6 and Rocq-MCP

Guillaume Baudart, Marc Lelarge, Tristan Stérin et al.

We report on an experiment in which Claude Opus~4.6, equipped with a suite of Model Context Protocol (MCP) tools for the Rocq proof assistant, autonomously proved 10 of 12 problems from the 2025 Putnam Mathematical Competition. The MCP tools, designed with Claude by analyzing logs from a prior experiment on miniF2F-Rocq, encode a "compile-first, interactive-fallback" strategy. Running on an isolated VM with no internet access, the agent deployed 141 subagents over 17.7 hours of active compute (51.6h wall-clock), consuming approximately 1.9 billion tokens. All proofs are publicly available.

94.7LOMar 23

Determination of the fifth Busy Beaver value

The bbchallenge Collaboration, Justin Blanchard, Daniel Briggs et al.

The Busy Beaver value $S(n)$ is the maximum number of steps that an $n$-state 2-symbol Turing machine can perform from the all-zero tape before halting. $S$ was historically introduced by Tibor RadÃ³ in 1962 as one of the simplest examples of an uncomputable function. We prove that $S(5) = 47,176,870$ using the Coq proof assistant. The proof enumerates $181,385,789$ Turing machines with 5 states and, for each machine, decides whether it halts or not. Our result marks the first determination of a new Busy Beaver value in over 40 years and the first Busy Beaver value ever to be formally verified, attesting to the effectiveness of massively collaborative online research (bbchallenge$.$org).

Tristan StÃ©rin

2 Papers