Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving
This work advances automated mathematical reasoning for complex theorem proving, such as IMO-level contests, by integrating formal verification with iterative reasoning, though it is incremental in building on existing methods like reinforcement learning with Lean.
The authors tackled automated theorem proving for high-level mathematical problems by proposing Seed-Prover, a lemma-style whole-proof reasoning model that iteratively refines proofs using Lean feedback, proved lemmas, and self-summarization, achieving 78.1% on formalized IMO problems, saturating MiniF2F, and over 50% on PutnamBench, significantly outperforming prior state-of-the-art.
LLMs have demonstrated strong mathematical reasoning abilities by leveraging reinforcement learning with long chain-of-thought, yet they continue to struggle with theorem proving due to the lack of clear supervision signals when solely using natural language. Dedicated domain-specific languages like Lean provide clear supervision via formal verification of proofs, enabling effective training through reinforcement learning. In this work, we propose \textbf{Seed-Prover}, a lemma-style whole-proof reasoning model. Seed-Prover can iteratively refine its proof based on Lean feedback, proved lemmas, and self-summarization. To solve IMO-level contest problems, we design three test-time inference strategies that enable both deep and broad reasoning. Seed-Prover proves $78.1\%$ of formalized past IMO problems, saturates MiniF2F, and achieves over 50\% on PutnamBench, outperforming the previous state-of-the-art by a large margin. To address the lack of geometry support in Lean, we introduce a geometry reasoning engine \textbf{Seed-Geometry}, which outperforms previous formal geometry engines. We use these two systems to participate in IMO 2025 and fully prove 5 out of 6 problems. This work represents a significant advancement in automated mathematical reasoning, demonstrating the effectiveness of formal verification with long chain-of-thought reasoning.