AICLDec 30, 2024

HunyuanProver: A Scalable Data Synthesis Framework and Guided Tree Search for Automated Theorem Proving

arXiv:2412.20735v330 citationsh-index: 19Has Code
Originality Incremental advance
AI Analysis

This work addresses data scarcity for researchers in automated theorem proving, offering incremental improvements with a scalable synthesis method.

The authors tackled the problem of data sparsity in automated theorem proving by introducing HunyuanProver, a framework that synthesizes data and uses guided tree search, achieving a state-of-the-art pass rate of 68.4% on the miniF2F-test benchmark and proving 4 IMO statements.

We introduce HunyuanProver, an language model finetuned from the Hunyuan 7B for interactive automatic theorem proving with LEAN4. To alleviate the data sparsity issue, we design a scalable framework to iterative synthesize data with low cost. Besides, guided tree search algorithms are designed to enable effective ``system 2 thinking`` of the prover. HunyuanProver achieves state-of-the-art (SOTA) performances on major benchmarks. Specifically, it achieves a pass of 68.4% on the miniF2F-test compared to 65.9%, the current SOTA results. It proves 4 IMO statements (imo_1960_p2, imo_1962_p2}, imo_1964_p2 and imo_1983_p6) in miniF2F-test. To benefit the community, we will open-source a dataset of 30k synthesized instances, where each instance contains the original question in natural language, the converted statement by autoformalization, and the proof by HunyuanProver.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes