Machine-Learned Premise Selection for Lean
This tool addresses the challenge of efficient theorem proving for users of the Lean proof assistant, representing an incremental improvement by integrating machine learning into an existing system.
The authors tackled the problem of premise selection in the Lean proof assistant by introducing a machine-learning tool that suggests relevant premises during proof construction, achieving a lightweight and fast approach through a custom random forest model implemented directly in Lean 4.
We introduce a machine-learning-based tool for the Lean proof assistant that suggests relevant premises for theorems being proved by a user. The design principles for the tool are (1) tight integration with the proof assistant, (2) ease of use and installation, (3) a lightweight and fast approach. For this purpose, we designed a custom version of the random forest model, trained in an online fashion. It is implemented directly in Lean, which was possible thanks to the rich and efficient metaprogramming features of Lean 4. The random forest is trained on data extracted from mathlib -- Lean's mathematics library. We experiment with various options for producing training features and labels. The advice from a trained model is accessible to the user via the suggest_premises tactic which can be called in an editor while constructing a proof interactively.