LG AI RONov 27, 2023

Replay across Experiments: A Natural Extension of Off-Policy RL

Dhruva Tirumala, Thomas Lampe, Jose Enrique Chen, Tuomas Haarnoja, Sandy Huang, Guy Lever, Ben Moran, Tim Hertweck, Leonard Hasenclever, Martin Riedmiller, Nicolas Heess, Markus Wulfmeier

DeepMind

arXiv:2311.15951v213.011 citationsh-index: 72

Originality Synthesis-oriented

AI Analysis

This work addresses data efficiency and exploration challenges in reinforcement learning research, representing an incremental extension of existing off-policy RL techniques.

The paper tackles the problem of data inefficiency in reinforcement learning by extending replay mechanisms across multiple experiments, resulting in improved controller performance and faster research iteration times across various RL algorithms and challenging control domains.

Replaying data is a principal mechanism underlying the stability and data efficiency of off-policy reinforcement learning (RL). We present an effective yet simple framework to extend the use of replays across multiple experiments, minimally adapting the RL workflow for sizeable improvements in controller performance and research iteration times. At its core, Replay Across Experiments (RaE) involves reusing experience from previous experiments to improve exploration and bootstrap learning while reducing required changes to a minimum in comparison to prior work. We empirically show benefits across a number of RL algorithms and challenging control domains spanning both locomotion and manipulation, including hard exploration tasks from egocentric vision. Through comprehensive ablations, we demonstrate robustness to the quality and amount of data available and various hyperparameter choices. Finally, we discuss how our approach can be applied more broadly across research life cycles and can increase resilience by reloading data across random seeds or hyperparameter variations.

View on arXiv PDF

Similar