AILGJul 3, 2025

Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving

arXiv:2507.02726v16 citationsh-index: 30
Originality Highly original
AI Analysis

This addresses the problem of sparse rewards and large-scale proofs in automated theorem proving for AI systems, representing a strong domain-specific advance.

The paper tackles the challenge of automated theorem proving for complex reasoning tasks by introducing self-generated goal-conditioned MDPs to structure subgoal generation, resulting in Bourbaki (7B) solving 26 problems on PutnamBench and achieving state-of-the-art results at that scale.

Reasoning remains a challenging task for large language models (LLMs), especially within the logically constrained environment of automated theorem proving (ATP), due to sparse rewards and the vast scale of proofs. These challenges are amplified in benchmarks like PutnamBench, which contains university-level problems requiring complex, multi-step reasoning. To address this, we introduce self-generated goal-conditioned MDPs (sG-MDPs), a new framework in which agents generate and pursue their subgoals based on the evolving proof state. Given this more structured generation of goals, the resulting problem becomes more amenable to search. We then apply Monte Carlo Tree Search (MCTS)-like algorithms to solve the sG-MDP, instantiating our approach in Bourbaki (7B), a modular system that can ensemble multiple 7B LLMs for subgoal generation and tactic synthesis. On PutnamBench, Bourbaki (7B) solves 26 problems, achieving new state-of-the-art results with models at this scale.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes