OrchestrRL: Dynamic Compute and Network Orchestration for Disaggregated RL
This addresses scalability issues in large-scale RL training for AI researchers and engineers, though it is incremental as it builds on existing disaggregated RL frameworks.
The paper tackles the bottleneck in disaggregated RL due to dynamic workloads and network traffic by introducing OrchestrRL, which dynamically orchestrates compute and network resources, achieving up to a 1.42x throughput improvement on a 64-GPU testbed.
Disaggregating the generation and training stages in RL is widely adopted to scale LLM post-training. There are two critical challenges here. First, the generation stage often becomes a bottleneck due to dynamic workload shifts and severe execution imbalances. Second, the decoupled stages result in diverse and dynamic network traffic patterns that strain the conventional static fabric. We build OrchestrRL to orchestrate dynamically both compute and network in disaggregated RL. OrchestrRL employs an adaptive compute scheduler that adjusts parallelism configuration to match changing workload characteristics within and across generation steps. OrchestrRL adopts a reconfigurable optical-electrical fabric called RFabric: It leverages optical circuit switches to reconfigure the aggregation and core layers of the topology on demand, tailoring bandwidth resources to the unique communication patterns across various phases of training, generation, and weight synchronization. Evaluated on a 64-H800 GPU testbed, OrchestrRL demonstrates up to a 1.42x throughput improvement over static baselines. Using a high-fidelity simulator, we also show that RFabric achieves superior performance-cost efficiency at scale over static Fat-Tree networks.