LGApr 13, 2021

Podracer architectures for scalable Reinforcement Learning

Matteo Hessel, Manuel Kroiss, Aidan Clark, Iurii Kemaev, John Quan, Thomas Keck, Fabio Viola, Hado van Hasselt

arXiv:2104.06272v121.851 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of scalable RL training for AI researchers, but it appears incremental as it adapts existing hardware (TPUs) to a known bottleneck in RL.

The authors tackled the challenge of scaling reinforcement learning (RL) training by proposing architectures optimized for TPU Pods, aiming to achieve scalable, efficient, and reproducible RL agent training.

Supporting state-of-the-art AI research requires balancing rapid prototyping, ease of use, and quick iteration, with the ability to deploy experiments at a scale traditionally associated with production systems.Deep learning frameworks such as TensorFlow, PyTorch and JAX allow users to transparently make use of accelerators, such as TPUs and GPUs, to offload the more computationally intensive parts of training and inference in modern deep learning systems. Popular training pipelines that use these frameworks for deep learning typically focus on (un-)supervised learning. How to best train reinforcement learning (RL) agents at scale is still an active research area. In this report we argue that TPUs are particularly well suited for training RL agents in a scalable, efficient and reproducible way. Specifically we describe two architectures designed to make the best use of the resources available on a TPU Pod (a special configuration in a Google data center that features multiple TPU devices connected to each other by extremely low latency communication channels).

View on arXiv PDF

Similar