Value of Information-Enhanced Exploration in Bootstrapped DQN
This addresses the problem of exploration-exploitation balance for reinforcement learning practitioners, offering an incremental enhancement to existing methods.
The paper tackles the challenge of efficient exploration in deep reinforcement learning for high-dimensional, sparse-reward environments by integrating expected value of information (EVOI) into Bootstrapped DQN, resulting in improved performance in complex Atari games without extra hyperparameters.
Efficient exploration in deep reinforcement learning remains a fundamental challenge, especially in environments characterized by high-dimensional states and sparse rewards. Traditional exploration strategies that rely on random local policy noise, such as $ε$-greedy and Boltzmann exploration methods, often struggle to efficiently balance exploration and exploitation. In this paper, we integrate the notion of (expected) value of information (EVOI) within the well-known Bootstrapped DQN algorithmic framework, to enhance the algorithm's deep exploration ability. Specifically, we develop two novel algorithms that incorporate the expected gain from learning the value of information into Bootstrapped DQN. Our methods use value of information estimates to measure the discrepancies of opinions among distinct network heads, and drive exploration towards areas with the most potential. We evaluate our algorithms with respect to performance and their ability to exploit inherent uncertainty arising from random network initialization. Our experiments in complex, sparse-reward Atari games demonstrate increased performance, all the while making better use of uncertainty, and, importantly, without introducing extra hyperparameters.