Training a First-Order Theorem Prover from Synthetic Data
This addresses the data bottleneck for researchers in automated theorem proving, though it is incremental as it builds on existing saturation-based methods.
The authors tackled the scarcity of training data in automated theorem proving by training a neurally-guided prover using synthetically generated theorems, achieving state-of-the-art performance on synthetic data and solving 72% of first-order problems without equality on the TPTP library.
A major challenge in applying machine learning to automated theorem proving is the scarcity of training data, which is a key ingredient in training successful deep learning models. To tackle this problem, we propose an approach that relies on training purely with synthetically generated theorems, without any human data aside from axioms. We use these theorems to train a neurally-guided saturation-based prover. Our neural prover outperforms the state-of-the-art E-prover on this synthetic data in both time and search steps, and shows significant transfer to the unseen human-written theorems from the TPTP library, where it solves 72\% of first-order problems without equality.