UCB Exploration via Q-Ensembles
This work addresses exploration challenges in deep reinforcement learning, showing incremental improvements over existing methods.
The paper tackled the problem of effective exploration in deep reinforcement learning by proposing an exploration strategy based on upper-confidence bounds using an ensemble of Q-functions, resulting in significant gains on the Atari benchmark.
We show how an ensemble of $Q^*$-functions can be leveraged for more effective exploration in deep reinforcement learning. We build on well established algorithms from the bandit setting, and adapt them to the $Q$-learning setting. We propose an exploration strategy based on upper-confidence bounds (UCB). Our experiments show significant gains on the Atari benchmark.