LGAIMLOct 3, 2019

Benchmarking Batch Deep Reinforcement Learning Algorithms

arXiv:1910.01708v1214 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of evaluating batch RL algorithms for researchers, providing a strong baseline but is incremental in nature.

The paper benchmarks recent batch deep reinforcement learning algorithms on Atari using data from a partially-trained policy, finding that many underperform compared to online DQN and the behavioral policy, and introduces a modified Batch-Constrained Q-learning algorithm that outperforms existing methods.

Widely-used deep reinforcement learning algorithms have been shown to fail in the batch setting--learning from a fixed data set without interaction with the environment. Following this result, there have been several papers showing reasonable performances under a variety of environments and batch settings. In this paper, we benchmark the performance of recent off-policy and batch reinforcement learning algorithms under unified settings on the Atari domain, with data generated by a single partially-trained behavioral policy. We find that under these conditions, many of these algorithms underperform DQN trained online with the same amount of data, as well as the partially-trained behavioral policy. To introduce a strong baseline, we adapt the Batch-Constrained Q-learning algorithm to a discrete-action setting, and show it outperforms all existing algorithms at this task.

Code Implementations5 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes