LGMLApr 18, 2018

A Study on Overfitting in Deep Reinforcement Learning

arXiv:1804.06893v2430 citations
Originality Incremental advance
AI Analysis

This addresses the problem of unreliable generalization in RL for critical applications like healthcare and finance, highlighting the need for better evaluation protocols.

The study systematically investigates overfitting in deep reinforcement learning, finding that standard RL agents can overfit robustly even with stochastic techniques, leading to drastically different test performance despite optimal training rewards.

Recent years have witnessed significant progresses in deep Reinforcement Learning (RL). Empowered with large scale neural networks, carefully designed architectures, novel training algorithms and massively parallel computing devices, researchers are able to attack many challenging RL problems. However, in machine learning, more training power comes with a potential risk of more overfitting. As deep RL techniques are being applied to critical problems such as healthcare and finance, it is important to understand the generalization behaviors of the trained agents. In this paper, we conduct a systematic study of standard RL agents and find that they could overfit in various ways. Moreover, overfitting could happen "robustly": commonly used techniques in RL that add stochasticity do not necessarily prevent or detect overfitting. In particular, the same agents and learning algorithms could have drastically different test performance, even when all of them achieve optimal rewards during training. The observations call for more principled and careful evaluation protocols in RL. We conclude with a general discussion on overfitting in RL and a study of the generalization behaviors from the perspective of inductive bias.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes