AICLLGAug 5, 2024

miniCTX: Neural Theorem Proving with (Long-)Contexts

CMU
arXiv:2408.03350v333 citationsh-index: 34
AI Analysis

This addresses the challenge of realistic theorem proving for AI systems by providing a more accurate evaluation, though it is incremental as it builds on existing neural methods with a new benchmark.

The authors tackled the problem of neural theorem proving in formal mathematics by introducing miniCTX, a benchmark that tests models' ability to prove theorems using new, long-context information not seen during training, and found that fine-tuning and prompting methods substantially outperform traditional state-based approaches.

Real-world formal theorem proving often depends on a wealth of context, including definitions, lemmas, comments, file structure, and other information. We introduce miniCTX, which tests a model's ability to prove formal mathematical theorems that depend on new context that is not seen during training. miniCTX contains theorems sourced from real Lean projects and textbooks, each associated with a context that can span tens of thousands of tokens. Models are tasked with proving a theorem given access to code from the theorem's repository, which contains context that is needed for the proof. As a baseline for miniCTX, we tested fine-tuning and prompting methods that condition theorem proving on preceding context. Both approaches substantially outperform traditional methods that rely solely on state information. We found that this ability to use context is not captured by previous benchmarks such as miniF2F. Alongside miniCTX, we offer ntp-toolkit for automatically extracting and annotating theorem proving data, making it easy to add new projects into miniCTX to ensure that contexts are not seen during training. miniCTX offers a challenging and realistic evaluation of neural theorem provers.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes