AI CL LG LO PLJul 15, 2024

PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition

George Tsoukalas, Jasper Lee, John Jennings, Jimmy Xin, Michelle Ding, Michael Jennings, Amitayush Thakur, Swarat Chaudhuri

arXiv:2407.11214v238.4145 citationsh-index: 10Has Code

Originality Synthesis-oriented

AI Analysis

This provides a new benchmark for evaluating neural theorem-provers in competition mathematics, though it is incremental as it builds on existing formalization efforts.

The authors introduced PutnamBench, a benchmark of 1692 formalized problems from the Putnam Mathematical Competition, to evaluate neural theorem-provers, finding that current methods can only solve a handful of these problems, establishing it as a difficult challenge.

We present PutnamBench, a new multi-language benchmark for evaluating the ability of neural theorem-provers to solve competition mathematics problems. PutnamBench consists of 1692 hand-constructed formalizations of 640 theorems sourced from the William Lowell Putnam Mathematical Competition, the premier undergraduate-level mathematics competition in North America. All the problems have formalizations in Lean 4 and Isabelle; a substantial subset also has Coq formalizations. PutnamBench requires significant problem-solving ability and proficiency in a broad range of topics taught in undergraduate mathematics courses. We use PutnamBench to evaluate several established neural and symbolic theorem-provers. These approaches can only solve a handful of the PutnamBench problems, establishing the benchmark as a difficult open challenge for research on neural theorem-proving. PutnamBench is available at https://github.com/trishullab/PutnamBench.

View on arXiv PDF Code

Similar