CLAILGFeb 5, 2024

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

arXiv:2402.03300v36753 citationsh-index: 33
AI Analysis

This addresses the problem of complex mathematical reasoning for AI researchers and developers, representing a strong incremental advance in specialized model capabilities.

The paper tackles the challenge of mathematical reasoning in language models by introducing DeepSeekMath 7B, which achieves 51.7% on the MATH benchmark without external tools and 60.9% with self-consistency, approaching the performance of Gemini-Ultra and GPT-4.

Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from Common Crawl, together with natural language and code data. DeepSeekMath 7B has achieved an impressive score of 51.7% on the competition-level MATH benchmark without relying on external toolkits and voting techniques, approaching the performance level of Gemini-Ultra and GPT-4. Self-consistency over 64 samples from DeepSeekMath 7B achieves 60.9% on MATH. The mathematical reasoning capability of DeepSeekMath is attributed to two key factors: First, we harness the significant potential of publicly available web data through a meticulously engineered data selection pipeline. Second, we introduce Group Relative Policy Optimization (GRPO), a variant of Proximal Policy Optimization (PPO), that enhances mathematical reasoning abilities while concurrently optimizing the memory usage of PPO.

Code Implementations5 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes