ML AI LG NEJun 8, 2016

Deep Successor Reinforcement Learning

Tejas D. Kulkarni, Ardavan Saeedi, Simanta Gautam, Samuel J. Gershman

arXiv:1606.02396v130.7231 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of improving value function learning in RL for applications like gaming and navigation, though it appears incremental as it builds on existing Successor Representations.

The paper tackles the problem of learning robust value functions in reinforcement learning by introducing DSR, a method that generalizes Successor Representations within a deep learning framework, and demonstrates its efficacy on grid-world and Doom environments with raw pixel observations.

Learning robust value functions given raw observations and rewards is now possible with model-free and model-based deep reinforcement learning algorithms. There is a third alternative, called Successor Representations (SR), which decomposes the value function into two components -- a reward predictor and a successor map. The successor map represents the expected future state occupancy from any given state and the reward predictor maps states to scalar rewards. The value function of a state can be computed as the inner product between the successor map and the reward weights. In this paper, we present DSR, which generalizes SR within an end-to-end deep reinforcement learning framework. DSR has several appealing properties including: increased sensitivity to distal reward changes due to factorization of reward and world dynamics, and the ability to extract bottleneck states (subgoals) given successor maps trained under a random policy. We show the efficacy of our approach on two diverse environments given raw pixel observations -- simple grid-world domains (MazeBase) and the Doom game engine.

View on arXiv PDF Code

Similar