CL AIOct 13, 2025

Conjecturing: An Overlooked Step in Formal Mathematical Reasoning

Jasivan Alex Sivakumar, Philipp Borchert, Ronald Cardenas, Gerasimos Lampouras

arXiv:2510.11986v12.71 citationsh-index: 13

Originality Incremental advance

AI Analysis

This addresses a critical gap in formal mathematical reasoning for AI researchers, though it is incremental as it builds on existing autoformalisation work by focusing on an overlooked step.

The paper tackles the problem that autoformalisation in mathematics often requires a preceding conjecturing step, which is overlooked in evaluations, and shows that accounting for conjecturing reveals LLMs' autoformalisation performance is substantially overestimated, with their method achieving successful end-to-end autoformalisation on 13 and 7 problems for GPT-4.1 and DeepSeek-V3.1 respectively.

Autoformalisation, the task of expressing informal mathematical statements in formal language, is often viewed as a direct translation process. This, however, disregards a critical preceding step: conjecturing. Many mathematical problems cannot be formalised directly without first conjecturing a conclusion such as an explicit answer, or a specific bound. Since Large Language Models (LLMs) already struggle with autoformalisation, and the evaluation of their conjecturing ability is limited and often entangled within autoformalisation or proof, it is particularly challenging to understand its effect. To address this gap, we augment existing datasets to create ConjectureBench, and redesign the evaluation framework and metric specifically to measure the conjecturing capabilities of LLMs both as a distinct task and within the autoformalisation pipeline. Our evaluation of foundational models, including GPT-4.1 and DeepSeek-V3.1, reveals that their autoformalisation performance is substantially overestimated when the conjecture is accounted for during evaluation. However, the conjecture should not be assumed to be provided. We design an inference-time method, Lean-FIRe to improve conjecturing and autoformalisation, which, to the best of our knowledge, achieves the first successful end-to-end autoformalisation of 13 PutnamBench problems with GPT-4.1 and 7 with DeepSeek-V3.1. We demonstrate that while LLMs possess the requisite knowledge to generate accurate conjectures, improving autoformalisation performance requires treating conjecturing as an independent task, and investigating further how to correctly integrate it within autoformalisation. Finally, we provide forward-looking guidance to steer future research toward improving conjecturing, an overlooked step of formal mathematical reasoning.

View on arXiv PDF

Similar