LGAug 3, 2025

Enhancing Math Reasoning in Small-sized LLMs via Preview Difficulty-Aware Intervention

arXiv:2508.01604v111.43 citationsh-index: 1Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge for researchers in replicating state-of-the-art math reasoning results in small language models, though it appears incremental as it builds on existing open-source frameworks.

The paper tackles the problem of replicating reinforcement learning training results for math reasoning in small language models by proposing an Early Preview Reinforcement Learning algorithm with difficulty-aware intervention. Their method applied to a 1.5B-parameter model achieves 50.0% on AIME24, 89.2% on Math500, 77.1% on AMC, 35.3% on Minerva, and 51.9% on OBench, surpassing O1-Preview and matching O1-mini.

Reinforcement learning scaling enhances the reasoning capabilities of large language models, with reinforcement learning serving as the key technique to draw out complex reasoning. However, key technical details of state-of-the-art reasoning LLMs, such as those in the OpenAI O series, Claude 3 series, DeepMind's Gemini 2.5 series, and Grok 3 series, remain undisclosed, making it difficult for the research community to replicate their reinforcement learning training results. Therefore, we start our study from an Early Preview Reinforcement Learning (EPRLI) algorithm built on the open-source GRPO framework, incorporating difficulty-aware intervention for math problems. Applied to a 1.5B-parameter LLM, our method achieves 50.0% on AIME24, 89.2% on Math500, 77.1% on AMC, 35.3% on Minerva, and 51.9% on OBench, superpass O1-Preview and is comparable to O1-mini within standard school-lab settings.

View on arXiv PDF

Similar