LGMLJun 10, 2020

What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study

arXiv:2006.05990v1276 citations
Originality Synthesis-oriented
AI Analysis

This work addresses a reproducibility and progress bottleneck for researchers and practitioners in reinforcement learning, though it is incremental in nature as it synthesizes existing choices rather than introducing new methods.

The paper tackles the problem of inconsistent implementations and undocumented design choices in on-policy reinforcement learning, conducting a large-scale empirical study with over 250,000 agents across five environments to provide insights and recommendations.

In recent years, on-policy reinforcement learning (RL) has been successfully applied to many different continuous control tasks. While RL algorithms are often conceptually simple, their state-of-the-art implementations take numerous low- and high-level design decisions that strongly affect the performance of the resulting agents. Those choices are usually not extensively discussed in the literature, leading to discrepancy between published descriptions of algorithms and their implementations. This makes it hard to attribute progress in RL and slows down overall progress [Engstrom'20]. As a step towards filling that gap, we implement >50 such ``choices'' in a unified on-policy RL framework, allowing us to investigate their impact in a large-scale empirical study. We train over 250'000 agents in five continuous control environments of different complexity and provide insights and practical recommendations for on-policy training of RL agents.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes