A Study of Global and Episodic Bonuses for Exploration in Contextual MDPs
This work addresses the problem of exploration in episodic environments for reinforcement learning researchers, providing a conceptual framework and improved algorithm, though it is incremental in refining existing bonus methods.
The study investigated the effectiveness of global and episodic novelty bonuses for exploration in contextual MDPs, finding that episodic bonuses work best with little shared structure across episodes, while global bonuses are better with more shared structure, and combining them led to a new state-of-the-art algorithm achieving robust performance on 16 MiniHack tasks and other environments.
Exploration in environments which differ across episodes has received increasing attention in recent years. Current methods use some combination of global novelty bonuses, computed using the agent's entire training experience, and \textit{episodic novelty bonuses}, computed using only experience from the current episode. However, the use of these two types of bonuses has been ad-hoc and poorly understood. In this work, we shed light on the behavior of these two types of bonuses through controlled experiments on easily interpretable tasks as well as challenging pixel-based settings. We find that the two types of bonuses succeed in different settings, with episodic bonuses being most effective when there is little shared structure across episodes and global bonuses being effective when more structure is shared. We develop a conceptual framework which makes this notion of shared structure precise by considering the variance of the value function across contexts, and which provides a unifying explanation of our empirical results. We furthermore find that combining the two bonuses can lead to more robust performance across different degrees of shared structure, and investigate different algorithmic choices for defining and combining global and episodic bonuses based on function approximation. This results in an algorithm which sets a new state of the art across 16 tasks from the MiniHack suite used in prior work, and also performs robustly on Habitat and Montezuma's Revenge.