LGFeb 10

Rollout-Training Co-Design for Efficient LLM-Based Multi-Agent Reinforcement Learning

arXiv:2602.09578v11 citationsh-index: 9
Originality Incremental advance
AI Analysis

This addresses system-level bottlenecks for researchers and practitioners deploying large-scale LLM-based MARL, though it is incremental as it builds on existing MARL concepts with a focus on infrastructure optimization.

The paper tackles the inefficiency of existing training frameworks for large-scale multi-agent reinforcement learning (MARL) by proposing FlexMARL, an end-to-end framework that optimizes rollout, training, and their orchestration, achieving up to 7.3x speedup and 5.6x improved hardware utilization in empirical tests.

Despite algorithm-level innovations for multi-agent reinforcement learning (MARL), the underlying networked infrastructure for large-scale MARL training remains underexplored. Existing training frameworks primarily optimize for single-agent scenarios and fail to address the unique system-level challenges of MARL, including rollout-training synchronization barriers, rollout load imbalance, and training resource underutilization. To bridge this gap, we propose FlexMARL, the first end-to-end training framework that holistically optimizes rollout, training, and their orchestration for large-scale LLM-based MARL. Specifically, FlexMARL introduces the joint orchestrator to manage data flow under the rollout-training disaggregated architecture. Building upon the experience store, a novel micro-batch driven asynchronous pipeline eliminates the synchronization barriers while providing strong consistency guarantees. Rollout engine adopts a parallel sampling scheme combined with hierarchical load balancing, which adapts to skewed inter/intra-agent request patterns. Training engine achieves on-demand hardware binding through agent-centric resource allocation. The training states of different agents are swapped via unified and location-agnostic communication. Empirical results on a large-scale production cluster demonstrate that FlexMARL achieves up to 7.3x speedup and improves hardware utilization by up to 5.6x compared to existing frameworks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes