AINov 30, 2025

IndiMathBench: Autoformalizing Mathematical Reasoning Problems with a Human Touch

Microsoft
arXiv:2512.00997v13 citationsh-index: 65Has Code
Originality Synthesis-oriented
AI Analysis

This provides a challenging testbed for mathematical reasoning research, though it is incremental as it builds on existing autoformalization and benchmarking efforts.

The authors tackled the challenge of evaluating mathematical theorem proving by introducing IndiMathBench, a human-verified benchmark of 312 formal Lean 4 theorems paired with informal statements from Indian Mathematics Olympiads, and found that autoformalization remains difficult with low theorem proving success rates even with refinement.

We introduce IndiMathBench, a human-verified benchmark designed to evaluate mathematical theorem proving, curated using an AI-powered human-assisted pipeline for formalizing natural language problems in Lean. IndiMathBench is composed of 312 formal Lean 4 theorems paired with their corresponding informal problem statements, sourced from Indian Mathematics Olympiads. Through category-based retrieval, iterative compiler feedback, and multi-model ensembles, our pipeline generates candidate formalizations that experts efficiently validate via an interactive dashboard with automated quality summaries. Evaluation across multiple frontier models demonstrates that autoformalization remains challenging, with substantial gaps between syntactic validity and semantic correctness, while theorem proving success rates remain low even with iterative refinement, demonstrating that \benchmark~presents a challenging testbed for mathematical reasoning. IndiMathBench is available at https://github.com/prmbiy/IndiMathBench.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes