DC LGOct 7, 2025

EARL: Efficient Agentic Reinforcement Learning Systems for Large Language Models

Zheyue Tan, Mustapha Abdullahi, Tuo Shi, Huining Yuan, Zelai Xu, Chao Yu, Boxun Li, Bo Zhao

arXiv:2510.05943v11.2h-index: 6

Originality Incremental advance

AI Analysis

This work addresses practical system bottlenecks for researchers and engineers scaling agentic RL systems, representing an incremental improvement in efficiency.

The paper tackles the problem of scaling agentic reinforcement learning for large language models by addressing bottlenecks in context length and data movement, resulting in increased throughput and reduced failures without imposing hard limits on context length.

Reinforcement learning (RL) has become a pivotal component of large language model (LLM) post-training, and agentic RL extends this paradigm to operate as agents through multi-turn interaction and tool use. Scaling such systems exposes two practical bottlenecks: (1) context length grows rapidly during training, inflating memory usage and latency, and triggering out-of-memory (OOM) failures; and (2) intermediate tensors accumulate with context length, making cross-device data movement a major system bottleneck. We present EARL, a scalable system for efficient agentic RL. EARL designs a parallelism selector that dynamically adapts model and training parallelism across RL stages based on sequence length and system load, and a data dispatcher that performs layout-aware, decentralized exchange of intermediate data batches. Together, these components increase throughput, reduce long-context failures, and enable stable large-scale training of agentic LLMs without relying on hard limits or penalties of context length.

View on arXiv PDF

Similar