LGAICLLOOct 24, 2025

Combining Textual and Structural Information for Premise Selection in Lean

arXiv:2510.23637v12 citationsh-index: 33
Originality Incremental advance
AI Analysis

This addresses a key bottleneck for scaling theorem proving in large formal libraries, offering a domain-specific incremental improvement.

The paper tackled the problem of premise selection in theorem proving by combining textual embeddings with graph neural networks over dependency graphs, resulting in a 25% improvement over the baseline on the LeanDojo Benchmark.

Premise selection is a key bottleneck for scaling theorem proving in large formal libraries. Yet existing language-based methods often treat premises in isolation, ignoring the web of dependencies that connects them. We present a graph-augmented approach that combines dense text embeddings of Lean formalizations with graph neural networks over a heterogeneous dependency graph capturing both state--premise and premise--premise relations. On the LeanDojo Benchmark, our method outperforms the ReProver language-based baseline by over 25% across standard retrieval metrics. These results demonstrate the power of relational information for more effective premise selection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes