CLJul 15, 2025

FMC: Formalization of Natural Language Mathematical Competition Problems

arXiv:2507.11275v13 citationsh-index: 15
Originality Incremental advance
AI Analysis

This work addresses the need for high-quality benchmarks in formal mathematical reasoning, particularly for automated theorem provers, though it is incremental in improving autoformalization methods.

The paper tackled the problem of autoformalizing natural language mathematical competition problems by proposing a training-free pipeline using large language models with error feedback, resulting in a dataset of 3,922 natural language and 9,787 Lean formalizations with 64.46% assessed as above-average quality.

Efficient and accurate autoformalization methods, which leverage large-scale datasets of extensive natural language mathematical problems to construct formal language datasets, are key to advancing formal mathematical reasoning. In this paper, we propose an autoformalization pipeline based on large language models with error feedback, achieving a fully automatic and training-free formalization approach. Using this pipeline, we curate an Olympiad-level dataset aligning natural language problems with Lean formalizations. The dataset comprises $3,922$ mathematical problems in natural language and $9,787$ in Lean, of which $64.46\%$ were assessed as at least above-average quality, making it suitable as a benchmark for automated theorem provers. Additionally, we investigate the formalization and reasoning capabilities of various LLMs and empirically demonstrate that few-shot learning, error feedback, and increasing sampling numbers enhance the autoformalization process. Experiments of three automated theorem provers on the \dataset\ dataset also highlight its challenging nature and its value as a benchmark for formal reasoning tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes