CLOct 28, 2024

Autoformalize Mathematical Statements by Symbolic Equivalence and Semantic Consistency

MicrosoftU of Toronto
arXiv:2410.20936v234 citationsh-index: 38NIPS
Originality Incremental advance
AI Analysis

This work improves autoformalization for mathematics, which is incremental as it builds on existing LLM methods with self-consistency techniques.

The paper tackles the problem of autoformalizing mathematical statements by addressing the gap between pass@1 and pass@k accuracies in LLM-generated formalizations, introducing a framework that selects the best candidate using symbolic equivalence and semantic consistency, resulting in up to 0.22-1.35x relative improvements in accuracy on MATH and miniF2F datasets.

Autoformalization, the task of automatically translating natural language descriptions into a formal language, poses a significant challenge across various domains, especially in mathematics. Recent advancements in large language models (LLMs) have unveiled their promising capabilities to formalize even competition-level math problems. However, we observe a considerable discrepancy between pass@1 and pass@k accuracies in LLM-generated formalizations. To address this gap, we introduce a novel framework that scores and selects the best result from k autoformalization candidates based on two complementary self-consistency methods: symbolic equivalence and semantic consistency. Elaborately, symbolic equivalence identifies the logical homogeneity among autoformalization candidates using automated theorem provers, and semantic consistency evaluates the preservation of the original meaning by informalizing the candidates and computing the similarity between the embeddings of the original and informalized texts. Our extensive experiments on the MATH and miniF2F datasets demonstrate that our approach significantly enhances autoformalization accuracy, achieving up to 0.22-1.35x relative improvements across various LLMs and baseline methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes