CLIRApr 1

MATH-PT: A Math Reasoning Benchmark for European and Brazilian Portuguese

arXiv:2604.2592657.8
AI Analysis

It addresses the lack of non-English math reasoning benchmarks for Portuguese, enabling evaluation of LLMs in a linguistically diverse context.

The paper introduces Math-PT, a dataset of 1,729 math problems in European and Brazilian Portuguese, and benchmarks LLMs on it, finding that frontier reasoning models perform well on multiple-choice questions but struggle with figures and open-ended questions.

The use of large language models (LLMs) for complex mathematical reasoning is an emergent area of research, with fast progress in methods, models, and benchmark datasets. However, most mathematical reasoning evaluations exhibit a significant linguistic bias, with the vast majority of benchmark datasets being exclusively in English or (at best) translated from English. We address this limitation by introducing {\sc Math-PT}, a novel dataset comprising 1,729 mathematical problems written in European and Brazilian Portuguese. {\sc Math-PT} is curated from a variety of high-quality native sources, including mathematical Olympiads, competitions, and exams from Portugal and Brazil. We present a comprehensive benchmark of current state-of-the-art LLMs on {\sc Math-PT}, revealing that frontier reasoning models achieve strong performance in multiple choice questions compared to open weight models, but that their performance decreases for questions with figures or open-ended questions. To facilitate future research, we release the benchmark dataset and model outputs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes