CL AIFeb 20, 2025

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

Tian Xie, Zitian Gao, Qingnan Ren, Haoming Luo, Yuqian Hong, Bryan Dai, Joey Zhou, Kai Qiu, Zhirong Wu, Chong Luo

arXiv:2502.14768v145.4212 citationsh-index: 7Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of improving reasoning capabilities in LLMs for tasks like math and logic, though it appears incremental as it builds on prior methods like DeepSeek-R1.

The authors tackled the problem of enhancing reasoning in large language models by applying rule-based reinforcement learning to synthetic logic puzzles, achieving a 7B model that generalizes to challenging math benchmarks like AIME and AMC after training on only 5K problems.

Inspired by the success of DeepSeek-R1, we explore the potential of rule-based reinforcement learning (RL) in large reasoning models. To analyze reasoning dynamics, we use synthetic logic puzzles as training data due to their controllable complexity and straightforward answer verification. We make some key technical contributions that lead to effective and stable RL training: a system prompt that emphasizes the thinking and answering process, a stringent format reward function that penalizes outputs for taking shortcuts, and a straightforward training recipe that achieves stable convergence. Our 7B model develops advanced reasoning skills-such as reflection, verification, and summarization-that are absent from the logic corpus. Remarkably, after training on just 5K logic problems, it demonstrates generalization abilities to the challenging math benchmarks AIME and AMC.

View on arXiv PDF Code

Similar