HO AIJun 16, 2025

Using Large Language Models to Study Mathematical Practice

arXiv:2507.02873v11.21 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the need for non-cherry-picked evidence in the philosophy of mathematical practice, offering a novel approach for philosophers to analyze large datasets, though it is incremental in applying existing LLM methods to a new domain.

The paper tackled the problem of studying mathematical explanation practices by using Google's Gemini 2.5 Pro to analyze 5000 mathematics papers from arXiv.org, resulting in a dataset of hundreds of annotated examples to gain insights into how often mathematicians discuss explanation and how practices vary by subject.

The philosophy of mathematical practice (PMP) looks to evidence from working mathematics to help settle philosophical questions. One prominent program under the PMP banner is the study of explanation in mathematics, which aims to understand what sorts of proofs mathematicians consider explanatory and what role the pursuit of explanation plays in mathematical practice. In an effort to address worries about cherry-picked examples and file-drawer problems in PMP, a handful of authors have recently turned to corpus analysis methods as a promising alternative to small-scale case studies. This paper reports the results from such a corpus study facilitated by Google's Gemini 2.5 Pro, a model whose reasoning capabilities, advances in hallucination control and large context window allow for the accurate analysis of hundreds of pages of text per query. Based on a sample of 5000 mathematics papers from arXiv.org, the experiments yielded a dataset of hundreds of useful annotated examples. Its aim was to gain insight on questions like the following: How often do mathematicians make claims about explanation in the relevant sense? Do mathematicians' explanatory practices vary in any noticeable way by subject matter? Which philosophical theories of explanation are most consistent with a large body of non-cherry-picked examples? How might philosophers make further use of AI tools to gain insights from large datasets of this kind? As the first PMP study making extensive use of LLM methods, it also seeks to begin a conversation about these methods as research tools in practice-oriented philosophy and to evaluate the strengths and weaknesses of current models for such work.

View on arXiv PDF

Similar