LG AIJan 30, 2025

ProofAug: Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis

Haoxiong Liu, Jiacheng Sun, Zhenguo Li, Andrew C Yao

arXiv:2501.18310v211 citationsh-index: 14Has CodeICML

AI Analysis

This work addresses the challenge of improving efficiency and performance in neural theorem proving for automated reasoning systems, representing an incremental advancement over prior methods.

The paper tackles the problem of inefficient neural theorem proving by proposing ProofAug, a method that integrates automation tools at multiple granularities through fine-grained proof structure analysis, achieving a cumulative pass rate of 66.0% on the miniF2F benchmark with 2100 queries per problem, compared to 56.1% for the previous state-of-the-art.

The synergy between deep learning models and traditional automation tools, such as built-in tactics of the proof assistant and off-the-shelf automated theorem provers, plays a crucial role in developing robust and efficient neural theorem provers(NTPs). However, for proof synthesis with LLMs, previous work applies automation tools either only when explicitly invoked by the model or at a single granularity level, failing to fully exploit their power. To solve this issue, we propose ProofAug, a procedure that equips LLMs with automation methods at various granularities through fine-grained structure analysis of model-generated proof proposals. ProofAug also serves as a versatile plug-and-play module that seamlessly integrates with any tree-search algorithm, enabling our construction of an efficient recursive proving (ERP) module to further enhance performance. The superiority of our method is validated on the miniF2F benchmark using the open-source deepseek-math-7b-base model and the Isabelle proof assistant. Notably, by additionally employing a mixed prompting strategy, we achieve a cumulative pass rate of 66.0% after curation of the dataset (61.9% for the original version) with 2100 queries to the model per problem (In contrast, the previous SOTA in Isabelle, Subgoal-XL, only achieves 56.1% using 16384 queries per problem). We also implement a Lean 4 version of ProofAug that can improve the pass@1 performance of Kimina-Prover-Preview-Distill-1.5B from 44.3% to 50.4% on miniF2F-test. Our code is available at https://github.com/haoxiongliu/ProofAug.

View on arXiv PDF Code

Similar