LG DCMay 12, 2025

INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning

Prime Intellect Team, Sami Jaghouar, Justus Mattern, Jack Min Ong, Jannik Straube, Manveer Basra, Aaron Pazdera, Kushal Thaman, Matthew Di Ferrante, Felix Gabriel, Fares Obeid, Kemal Erdem

arXiv:2505.07291v126.925 citationsh-index: 5Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of scalable and open AI training for the research community, though it is incremental in advancing decentralized methods.

The authors tackled the problem of training a large-scale reasoning model using decentralized reinforcement learning, achieving a state-of-the-art result by improving upon QwQ-32B with a 32 billion parameter model trained asynchronously across a global compute swarm.

We introduce INTELLECT-2, the first globally distributed reinforcement learning (RL) training run of a 32 billion parameter language model. Unlike traditional centralized training efforts, INTELLECT-2 trains a reasoning model using fully asynchronous RL across a dynamic, heterogeneous swarm of permissionless compute contributors. To enable a training run with this unique infrastructure, we built various components from scratch: we introduce PRIME-RL, our training framework purpose-built for distributed asynchronous reinforcement learning, based on top of novel components such as TOPLOC, which verifies rollouts from untrusted inference workers, and SHARDCAST, which efficiently broadcasts policy weights from training nodes to inference workers. Beyond infrastructure components, we propose modifications to the standard GRPO training recipe and data filtering techniques that were crucial to achieve training stability and ensure that our model successfully learned its training objective, thus improving upon QwQ-32B, the state of the art reasoning model in the 32B parameter range. We open-source INTELLECT-2 along with all of our code and data, hoping to encourage and enable more open research in the field of decentralized training.

View on arXiv PDF

Similar