Information Content Exploration
This work addresses the problem of efficient exploration for reinforcement learning agents in sparse reward environments, which is a known bottleneck, and shows incremental improvements over existing methods.
The paper tackles the challenge of sparse reward environments in reinforcement learning by proposing a new intrinsic reward that quantifies exploratory behavior and promotes state coverage by maximizing the information content of trajectories. The method outperforms existing techniques like Curiosity Driven Learning and Random Network Distillation in various games, including Montezuma Revenge, and an extension improves sample efficiency and generalizes to continuous state spaces.
Sparse reward environments are known to be challenging for reinforcement learning agents. In such environments, efficient and scalable exploration is crucial. Exploration is a means by which an agent gains information about the environment. We expand on this topic and propose a new intrinsic reward that systemically quantifies exploratory behavior and promotes state coverage by maximizing the information content of a trajectory taken by an agent. We compare our method to alternative exploration based intrinsic reward techniques, namely Curiosity Driven Learning and Random Network Distillation. We show that our information theoretic reward induces efficient exploration and outperforms in various games, including Montezuma Revenge, a known difficult task for reinforcement learning. Finally, we propose an extension that maximizes information content in a discretely compressed latent space which boosts sample efficiency and generalizes to continuous state spaces.