CLLGMar 13, 2025

Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond

arXiv:2503.10460v4171 citationsh-index: 6Has CodeACL
Originality Incremental advance
AI Analysis

This work addresses the problem of making advanced reasoning models more accessible for real-world applications by providing an open-source alternative to proprietary methods, though it is incremental in improving existing training techniques.

The paper tackles training long reasoning models with a cost-effective, open-source curriculum approach using public data, resulting in Light-R1 models that outperform or match proprietary models in math reasoning, such as Light-R1-14B-DS achieving SOTA scores of 74.0 and 60.2 on AIME24 & 25 benchmarks.

This paper introduces Light-R1, an open-source suite for training long reasoning models using reproducible and cost-effective methodology. Given the proprietary nature of data used in the DeepSeek-R1 series, we develop an alternative approach leveraging exclusively public data and models. Our curriculum training progressively increases data difficulty, combined with multi-staged post-training. Our Light-R1-32B model, trained from Qwen2.5-32B-Instruct, outperforms DeepSeek-R1-Distill-Qwen-32B in math reasoning. Experimental results show that this curriculum approach becomes more effective when distinct, diverse datasets are available for different training stages: fine-tuning DeepSeek-R1-Distilled models (pre-tuned by DeepSeek team on proprietary data) with 3,000 challenging examples from our curriculum dataset yielded state-of-the-art 7B and 14B models, while the 32B model, Light-R1-32B-DS performed comparably to QwQ-32B and DeepSeek-R1. Furthermore, we extend our work by applying GRPO on long reasoning models. Our final Light-R1-14B-DS achieves SOTA performance among 14B models in math, with AIME24 & 25 scores of 74.0 and 60.2 respectively, surpassing many 32B models and DeepSeek-R1-Distill-Llama-70B. Despite math-focused training, Light-R1-14B-DS demonstrates strong cross-domain generalization. Light-R1 represents a significant advancement in making sophisticated reasoning models more accessible and implementable in real-world applications. Our models, training data and code have been made available at https://github.com/Qihoo360/Light-R1.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes