CLMar 11

mAceReason-Math: A Dataset of High-Quality Multilingual Math Problems Ready For RLVR

Konstantin Dobler, Simon Lehnerer, Federico Scozzafava, Jonathan Janke, Mohamed Ali

arXiv:2603.10767v124.9h-index: 4

Predicted impact top 21% in CL · last 90 daysOriginality Synthesis-oriented

AI Analysis

This addresses the problem of English-centric datasets for RLVR research, enabling multilingual training and benchmarking, though it is incremental as it builds on existing data and methods.

The authors tackled the lack of high-quality multilingual training data for Reinforcement Learning with Verifiable Rewards (RLVR) in math domains by creating mAceReason-Math, a dataset of over 10,000 challenging math problems per language across 14 languages, sourced from an RLVR-curated corpus and carefully translated.

Reinforcement Learning with Verifiable Rewards (RLVR) has been successfully applied to significantly boost the capabilities of pretrained large language models, especially in the math and logic problem domains. However, current research and available training datasets remain English-centric. While mul- tilingual training data and benchmarks have been created in the past, they were not created with RLVR and current model capability in mind, and their level of difficulty is often too low to provide appropriate training signals for current models. To address this gap, we provide mAceReason-Math, a dataset of high-quality translations of challenging math problems sourced from a corpus specifically curated for RLVR (AceReason-Math). We further take specific care to clean and improve our translations, resulting in a coverage of 14 languages with more than 10,000 samples per language. We release the dataset to facilitate multilingual RLVR research and benchmarking in the research community.

View on arXiv PDF

Similar