How to Make Deep RL Work in Practice
This work addresses the challenge of making deep RL more reliable for large-scale real-world applications, though it is incremental as it focuses on optimizing existing methods rather than introducing new paradigms.
The paper tackles the problem of unreliable performance and reproducibility in deep reinforcement learning (RL) by investigating how implementation details like initialization, input normalization, and adaptive learning techniques affect state-of-the-art algorithms, resulting in practical suggestions for default techniques and identifying areas needing RL-specific solutions.
In recent years, challenging control problems became solvable with deep reinforcement learning (RL). To be able to use RL for large-scale real-world applications, a certain degree of reliability in their performance is necessary. Reported results of state-of-the-art algorithms are often difficult to reproduce. One reason for this is that certain implementation details influence the performance significantly. Commonly, these details are not highlighted as important techniques to achieve state-of-the-art performance. Additionally, techniques from supervised learning are often used by default but influence the algorithms in a reinforcement learning setting in different and not well-understood ways. In this paper, we investigate the influence of certain initialization, input normalization, and adaptive learning techniques on the performance of state-of-the-art RL algorithms. We make suggestions which of those techniques to use by default and highlight areas that could benefit from a solution specifically tailored to RL.