LGMLApr 17, 2019

Rogue-Gym: A New Challenge for Generalization in Reinforcement Learning

arXiv:1904.08129v216 citations
Originality Synthesis-oriented
AI Analysis

This addresses the issue of poor generalization in RL for researchers, but it is incremental as it builds on existing procedural content generation environments.

The authors tackled the problem of overfitting in reinforcement learning agents by introducing Rogue-Gym, a roguelike game benchmark for evaluating generalization, and found that some enhancements like PPO with modifications failed to mitigate overfitting, while others slightly improved generalization ability.

In this paper, we propose Rogue-Gym, a simple and classic style roguelike game built for evaluating generalization in reinforcement learning (RL). Combined with the recent progress of deep neural networks, RL has successfully trained human-level agents without human knowledge in many games such as those for Atari 2600. However, it has been pointed out that agents trained with RL methods often overfit the training environment, and they work poorly in slightly different environments. To investigate this problem, some research environments with procedural content generation have been proposed. Following these studies, we propose the use of roguelikes as a benchmark for evaluating the generalization ability of RL agents. In our Rogue-Gym, agents need to explore dungeons that are structured differently each time they start a new game. Thanks to the very diverse structures of the dungeons, we believe that the generalization benchmark of Rogue-Gym is sufficiently fair. In our experiments, we evaluate a standard reinforcement learning method, PPO, with and without enhancements for generalization. The results show that some enhancements believed to be effective fail to mitigate the overfitting in Rogue-Gym, although others slightly improve the generalization ability.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes