MLLGFeb 11

Why Agentic Theorem Prover Works: A Statistical Provability Theory of Mathematical Reasoning Models

arXiv:2602.10538v1h-index: 3
Originality Incremental advance
AI Analysis

This provides a principled explanation for the success of agentic theorem provers in mathematical reasoning, though it is incremental as it builds on existing pipeline frameworks.

The paper tackles the problem of understanding why agentic theorem provers succeed despite the classical hardness of proof search, by introducing a statistical provability theory that explains their performance on real-world problem distributions and bounds the performance gap of score-guided planning methods.

Agentic theorem provers -- pipelines that couple a mathematical reasoning model with library retrieval, subgoal-decomposition/search planner, and a proof assistant verifier -- have recently achieved striking empirical success, yet it remains unclear which components drive performance and why such systems work at all despite classical hardness of proof search. We propose a distributional viewpoint and introduce **statistical provability**, defined as the finite-horizon success probability of reaching a verified proof, averaged over an instance distribution, and formalize modern theorem-proving pipelines as time-bounded MDPs. Exploiting Bellman structure, we prove existence of optimal policies under mild regularity, derive provability certificates via sub-/super-solution inequalities, and bound the performance gap of score-guided planning (greedy/top-\(k\)/beam/rollouts) in terms of approximation error, sequential statistical complexity, representation geometry (metric entropy/doubling structure), and action-gap margin tails. Together, our theory provides a principled, component-sensitive explanation of when and why agentic theorem provers succeed on biased real-world problem distributions, while clarifying limitations in worst-case or adversarial regimes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes