DCAILGDec 27, 2025

RollArt: Scaling Agentic RL Training via Disaggregated Infrastructure

arXiv:2512.22560v19 citationsh-index: 16Has Code
Originality Incremental advance
AI Analysis

This work addresses infrastructure bottlenecks for large-scale agentic RL training, enabling faster development of autonomous AI agents, though it is incremental as it builds on existing disaggregation concepts.

The paper tackles the challenge of efficiently training agentic reinforcement learning (RL) systems, which involve heterogeneous workloads, by proposing RollArc, a distributed system that uses disaggregated infrastructure to improve throughput, resulting in a 1.35-2.05× reduction in end-to-end training time compared to baselines.

Agentic Reinforcement Learning (RL) enables Large Language Models (LLMs) to perform autonomous decision-making and long-term planning. Unlike standard LLM post-training, agentic RL workloads are highly heterogeneous, combining compute-intensive prefill phases, bandwidth-bound decoding, and stateful, CPU-heavy environment simulations. We argue that efficient agentic RL training requires disaggregated infrastructure to leverage specialized, best-fit hardware. However, naive disaggregation introduces substantial synchronization overhead and resource underutilization due to the complex dependencies between stages. We present RollArc, a distributed system designed to maximize throughput for multi-task agentic RL on disaggregated infrastructure. RollArc is built on three core principles: (1) hardware-affinity workload mapping, which routes compute-bound and bandwidth-bound tasks to bestfit GPU devices, (2) fine-grained asynchrony, which manages execution at the trajectory level to mitigate resource bubbles, and (3) statefulness-aware computation, which offloads stateless components (e.g., reward models) to serverless infrastructure for elastic scaling. Our results demonstrate that RollArc effectively improves training throughput and achieves 1.35-2.05\(\times\) end-to-end training time reduction compared to monolithic and synchronous baselines. We also evaluate RollArc by training a hundreds-of-billions-parameter MoE model for Qoder product on an Alibaba cluster with more than 3,000 GPUs, further demonstrating RollArc scalability and robustness. The code is available at https://github.com/alibaba/ROLL.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes