LGAIJul 15, 2019

PPO Dash: Improving Generalization in Deep Reinforcement Learning

arXiv:1907.06704v33 citations
Originality Synthesis-oriented
AI Analysis

This addresses generalization issues in RL for researchers, though it appears incremental as it builds on existing methods.

The paper tackled overfitting in deep reinforcement learning by testing improvements to the PPO algorithm on the Obstacle Tower Challenge, achieving state-of-the-art performance.

Deep reinforcement learning is prone to overfitting, and traditional benchmarks such as Atari 2600 benchmark can exacerbate this problem. The Obstacle Tower Challenge addresses this by using randomized environments and separate seeds for training, validation, and test runs. This paper examines various improvements and best practices to the PPO algorithm using the Obstacle Tower Challenge to empirically study their impact with regards to generalization. Our experiments show that the combination provides state-of-the-art performance on the Obstacle Tower Challenge.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes