A Laplacian Framework for Option Discovery in Reinforcement Learning
This addresses representation learning and option discovery challenges in RL, providing a method for task-agnostic option discovery, though it appears incremental as it builds on existing proto-value function approaches.
The paper tackles the option discovery problem in reinforcement learning by showing how proto-value functions implicitly define options through eigenpurposes, which are intrinsic reward functions derived from learned representations; the discovered options traverse principal state-space directions, act at different time scales for exploration, and are demonstrated in tabular domains and Atari 2600 games.
Representation learning and option discovery are two of the biggest challenges in reinforcement learning (RL). Proto-value functions (PVFs) are a well-known approach for representation learning in MDPs. In this paper we address the option discovery problem by showing how PVFs implicitly define options. We do it by introducing eigenpurposes, intrinsic reward functions derived from the learned representations. The options discovered from eigenpurposes traverse the principal directions of the state space. They are useful for multiple tasks because they are discovered without taking the environment's rewards into consideration. Moreover, different options act at different time scales, making them helpful for exploration. We demonstrate features of eigenpurposes in traditional tabular domains as well as in Atari 2600 games.