Learning to Prove from Synthetic Theorems
This addresses a key data bottleneck for researchers in automated theorem proving, offering a promising incremental improvement by leveraging synthetic data to enhance prover performance.
The paper tackles the scarcity of training data in automated theorem proving by using synthetic theorems generated from axioms, showing that a prover trained on this data solves a substantial fraction of problems in the TPTP benchmark and outperforms models trained on human-generated problems in most axiom sets.
A major challenge in applying machine learning to automated theorem proving is the scarcity of training data, which is a key ingredient in training successful deep learning models. To tackle this problem, we propose an approach that relies on training with synthetic theorems, generated from a set of axioms. We show that such theorems can be used to train an automated prover and that the learned prover transfers successfully to human-generated theorems. We demonstrate that a prover trained exclusively on synthetic theorems can solve a substantial fraction of problems in TPTP, a benchmark dataset that is used to compare state-of-the-art heuristic provers. Our approach outperforms a model trained on human-generated problems in most axiom sets, thereby showing the promise of using synthetic data for this task.