Adaptive Policy Synchronization for Scalable Reinforcement Learning
This addresses the challenge of scalable RL for researchers and practitioners using distributed systems, though it appears incremental as it builds on existing frameworks like Gymnasium.
The paper tackles the problem of scaling reinforcement learning across distributed machines by introducing ClusterEnv, a lightweight interface that uses the DETACH pattern and Adaptive Policy Synchronization (APS) to reduce synchronization overhead while maintaining performance, with experiments on discrete control tasks showing APS cuts this overhead effectively.
Scaling reinforcement learning (RL) often requires running environments across many machines, but most frameworks tie simulation, training, and infrastructure into rigid systems. We introduce ClusterEnv, a lightweight interface for distributed environment execution that preserves the familiar Gymnasium API. ClusterEnv uses the DETACH pattern, which moves environment reset() and step() operations to remote workers while keeping learning centralized. To reduce policy staleness without heavy communication, we propose Adaptive Policy Synchronization (APS), where workers request updates only when divergence from the central learner grows too large. ClusterEnv supports both on- and off-policy methods, integrates into existing training code with minimal changes, and runs efficiently on clusters. Experiments on discrete control tasks show that APS maintains performance while cutting synchronization overhead. Source code is available at https://github.com/rodlaf/ClusterEnv.