Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning
This addresses scalability issues for distributed deep reinforcement learning systems, though it is incremental as it builds on existing A2C methods.
The paper tackles the problem of improving computational efficiency and scalability in multi-simulator deep reinforcement learning by proposing Gossip-based Actor-Learner Architectures (GALA), which use asynchronous gossip communication to reduce synchronization, resulting in higher hardware utilization and frame-rates compared to A2C.
Multi-simulator training has contributed to the recent success of Deep Reinforcement Learning by stabilizing learning and allowing for higher training throughputs. We propose Gossip-based Actor-Learner Architectures (GALA) where several actor-learners (such as A2C agents) are organized in a peer-to-peer communication topology, and exchange information through asynchronous gossip in order to take advantage of a large number of distributed simulators. We prove that GALA agents remain within an epsilon-ball of one-another during training when using loosely coupled asynchronous communication. By reducing the amount of synchronization between agents, GALA is more computationally efficient and scalable compared to A2C, its fully-synchronous counterpart. GALA also outperforms A2C, being more robust and sample efficient. We show that we can run several loosely coupled GALA agents in parallel on a single GPU and achieve significantly higher hardware utilization and frame-rates than vanilla A2C at comparable power draws.