LGMLJun 5, 2017

UCB Exploration via Q-Ensembles

arXiv:1706.01502v391 citations
Originality Incremental advance
AI Analysis

This work addresses exploration challenges in deep reinforcement learning, showing incremental improvements over existing methods.

The paper tackled the problem of effective exploration in deep reinforcement learning by proposing an exploration strategy based on upper-confidence bounds using an ensemble of Q-functions, resulting in significant gains on the Atari benchmark.

We show how an ensemble of $Q^*$-functions can be leveraged for more effective exploration in deep reinforcement learning. We build on well established algorithms from the bandit setting, and adapt them to the $Q$-learning setting. We propose an exploration strategy based on upper-confidence bounds (UCB). Our experiments show significant gains on the Atari benchmark.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes