CL AI LGJan 22, 2025

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang

StanfordTsinghua

arXiv:2501.12948v184.05427 citationsh-index: 25Has CodeNature

Originality Incremental advance

AI Analysis

This addresses the problem of enhancing reasoning in LLMs for AI researchers and developers, though it appears incremental as it builds on existing RL methods and benchmarks.

The paper tackles improving reasoning capabilities in large language models by introducing DeepSeek-R1-Zero and DeepSeek-R1, trained via reinforcement learning, with DeepSeek-R1 achieving performance comparable to OpenAI-o1-1217 on reasoning tasks.

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama.

View on arXiv PDF Code

Similar