LG AI DCOct 8, 2023

GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models

Hanjing Wang, Man-Kit Sit, Congjie He, Ying Wen, Weinan Zhang, Jun Wang, Yaodong Yang, Luo Mai

arXiv:2310.05205v17.75 citationsh-index: 60Has Code

Originality Incremental advance

AI Analysis

This addresses a domain-specific problem for researchers and practitioners in reinforcement learning by providing a more efficient system for training large models, though it is incremental as it builds on existing replay systems.

The paper tackles the bottlenecks in memory, computation, and communication for scalable reinforcement learning with large sequence models by introducing GEAR, a distributed, GPU-centric experience replay system, achieving up to 6x performance improvement over Reverb in cluster experiments.

This paper introduces a distributed, GPU-centric experience replay system, GEAR, designed to perform scalable reinforcement learning (RL) with large sequence models (such as transformers). With such models, existing systems such as Reverb face considerable bottlenecks in memory, computation, and communication. GEAR, however, optimizes memory efficiency by enabling the memory resources on GPU servers (including host memory and device memory) to manage trajectory data. Furthermore, it facilitates decentralized GPU devices to expedite various trajectory selection strategies, circumventing computational bottlenecks. GEAR is equipped with GPU kernels capable of collecting trajectories using zero-copy access to host memory, along with remote-directed-memory access over InfiniBand, improving communication efficiency. Cluster experiments have shown that GEAR can achieve performance levels up to 6x greater than Reverb when training state-of-the-art large RL models. GEAR is open-sourced at https://github.com/bigrl-team/gear.

View on arXiv PDF Code

Similar