CLLGLOJun 4, 2024

Process-Driven Autoformalization in Lean 4

arXiv:2406.01940v246 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the problem of keeping autoformalization tools up-to-date with new formal languages like Lean 4 for researchers and practitioners in mathematical reasoning and AI, representing an incremental advancement by adapting existing methods to a specific domain.

The paper tackles the challenge of autoformalizing natural language mathematics into the rapidly evolving Lean 4 language by introducing a new benchmark (Formalization for Lean 4) and a Process-Supervised Verifier (PSV) model that uses compiler feedback to improve accuracy with less training data, achieving higher accuracy in autoformalization.

Autoformalization, the conversion of natural language mathematics into formal languages, offers significant potential for advancing mathematical reasoning. However, existing efforts are limited to formal languages with substantial online corpora and struggle to keep pace with rapidly evolving languages like Lean 4. To bridge this gap, we propose a new benchmark \textbf{Form}alization for \textbf{L}ean~\textbf{4} (\textbf{\name}) designed to evaluate the autoformalization capabilities of large language models (LLMs). This benchmark encompasses a comprehensive assessment of questions, answers, formal statements, and proofs. Additionally, we introduce a \textbf{P}rocess-\textbf{S}upervised \textbf{V}erifier (\textbf{PSV}) model that leverages the precise feedback from Lean 4 compilers to enhance autoformalization. Our experiments demonstrate that the PSV method improves autoformalization, enabling higher accuracy using less filtered training data. Furthermore, when fine-tuned with data containing detailed process information, PSV can leverage the data more effectively, leading to more significant improvements in autoformalization for Lean 4. Our dataset and code are available at \url{https://github.com/rookie-joe/PDA}.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes