LGAISep 29, 2025

Putnam-like dataset summary: LLMs as mathematical competition contestants

arXiv:2509.24827v22 citationsh-index: 8
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of evaluating LLMs' mathematical reasoning abilities for researchers and AI developers, but it is incremental as it applies existing methods to a new dataset.

The paper analyzed the performance of large language models (LLMs) on a dataset of 96 Putnam-like mathematical competition problems, finding that they achieved a 0% success rate, indicating they cannot solve such problems effectively.

In this paper we summarize the results of the Putnam-like benchmark published by Google DeepMind. This dataset consists of 96 original problems in the spirit of the Putnam Competition and 576 solutions of LLMs. We analyse the performance of models on this set of problems to verify their ability to solve problems from mathematical contests.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes