LG MLJun 5, 2017

UCB Exploration via Q-Ensembles

Richard Y. Chen, Szymon Sidor, Pieter Abbeel, John Schulman

arXiv:1706.01502v325.891 citations

Originality Incremental advance

AI Analysis

This work addresses exploration challenges in deep reinforcement learning, showing incremental improvements over existing methods.

The paper tackled the problem of effective exploration in deep reinforcement learning by proposing an exploration strategy based on upper-confidence bounds using an ensemble of Q-functions, resulting in significant gains on the Atari benchmark.

We show how an ensemble of $Q^*$-functions can be leveraged for more effective exploration in deep reinforcement learning. We build on well established algorithms from the bandit setting, and adapt them to the $Q$-learning setting. We propose an exploration strategy based on upper-confidence bounds (UCB). Our experiments show significant gains on the Atari benchmark.

View on arXiv PDF

Similar