LGMar 6, 2017

Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning

arXiv:1703.01732v124.0251 citations

Originality Incremental advance

AI Analysis

This addresses the problem of insufficient exploration in complex domains for AI researchers, offering a scalable solution that is incremental over prior heuristic methods.

The paper tackles the challenge of exploration in deep reinforcement learning for tasks with sparse rewards by proposing surprise-based intrinsic motivation methods that approximate KL-divergence of transition probabilities. It shows that these incentives outperform other heuristic exploration techniques in environments like continuous control and Atari RAM games.

Exploration in complex domains is a key challenge in reinforcement learning, especially for tasks with very sparse rewards. Recent successes in deep reinforcement learning have been achieved mostly using simple heuristic exploration strategies such as $ε$-greedy action selection or Gaussian control noise, but there are many tasks where these methods are insufficient to make any learning progress. Here, we consider more complex heuristics: efficient and scalable exploration strategies that maximize a notion of an agent's surprise about its experiences via intrinsic motivation. We propose to learn a model of the MDP transition probabilities concurrently with the policy, and to form intrinsic rewards that approximate the KL-divergence of the true transition probabilities from the learned model. One of our approximations results in using surprisal as intrinsic motivation, while the other gives the $k$-step learning progress. We show that our incentives enable agents to succeed in a wide range of environments with high-dimensional state spaces and very sparse rewards, including continuous control tasks and games in the Atari RAM domain, outperforming several other heuristic exploration techniques.

View on arXiv PDF

Similar