GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning
This addresses the problem of high computational costs for researchers and practitioners in robotics and AI, though it is incremental as it builds on existing distributed training and simulation approaches.
The paper tackled the slow training times in deep reinforcement learning by using GPU-accelerated simulations instead of CPU ones, achieving training of the Humanoid running task in under 20 minutes with 10-1000x fewer CPU cores than prior methods.
Most Deep Reinforcement Learning (Deep RL) algorithms require a prohibitively large number of training samples for learning complex tasks. Many recent works on speeding up Deep RL have focused on distributed training and simulation. While distributed training is often done on the GPU, simulation is not. In this work, we propose using GPU-accelerated RL simulations as an alternative to CPU ones. Using NVIDIA Flex, a GPU-based physics engine, we show promising speed-ups of learning various continuous-control, locomotion tasks. With one GPU and CPU core, we are able to train the Humanoid running task in less than 20 minutes, using 10-1000x fewer CPU cores than previous works. We also demonstrate the scalability of our simulator to multi-GPU settings to train more challenging locomotion tasks.