LG AI CL LOOct 24, 2025

Combining Textual and Structural Information for Premise Selection in Lean

Job Petrovčič, David Eliecer Narvaez Denis, Ljupčo Todorovski

arXiv:2510.23637v12 citationsh-index: 33

Originality Incremental advance

AI Analysis

This addresses a key bottleneck for scaling theorem proving in large formal libraries, offering a domain-specific incremental improvement.

The paper tackled the problem of premise selection in theorem proving by combining textual embeddings with graph neural networks over dependency graphs, resulting in a 25% improvement over the baseline on the LeanDojo Benchmark.

Premise selection is a key bottleneck for scaling theorem proving in large formal libraries. Yet existing language-based methods often treat premises in isolation, ignoring the web of dependencies that connects them. We present a graph-augmented approach that combines dense text embeddings of Lean formalizations with graph neural networks over a heterogeneous dependency graph capturing both state--premise and premise--premise relations. On the LeanDojo Benchmark, our method outperforms the ReProver language-based baseline by over 25% across standard retrieval metrics. These results demonstrate the power of relational information for more effective premise selection.

View on arXiv PDF

Similar