Efficient Neural Clause-Selection Reinforcement
This work addresses the challenge of automating clause selection in theorem proving, which is incremental as it builds on existing RL approaches to enhance a specific prover.
The paper tackles the problem of clause selection in theorem proving by framing it as a reinforcement learning task, resulting in a neural network that improves the number of solved problems by 20% over a baseline under a short CPU limit.
Clause selection is arguably the most important choice point in saturation-based theorem proving. Framing it as a reinforcement learning (RL) task is a way to challenge the human-designed heuristics of state-of-the-art provers and to instead automatically evolve -- just from prover experiences -- their potentially optimal replacement. In this work, we present a neural network architecture for scoring clauses for clause selection that is powerful yet efficient to evaluate. Following RL principles to make design decisions, we integrate the network into the Vampire theorem prover and train it from successful proof attempts. An experiment on the diverse TPTP benchmark finds the neurally guided prover improve over a baseline strategy, from which it initially learns -- in terms of the number of in-training-unseen problems solved under a practically relevant, short CPU instruction limit -- by 20%.