LGMar 6, 2017

Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning

arXiv:1703.01732v1251 citations
Originality Incremental advance
AI Analysis

This addresses the problem of insufficient exploration in complex domains for AI researchers, offering a scalable solution that is incremental over prior heuristic methods.

The paper tackles the challenge of exploration in deep reinforcement learning for tasks with sparse rewards by proposing surprise-based intrinsic motivation methods that approximate KL-divergence of transition probabilities. It shows that these incentives outperform other heuristic exploration techniques in environments like continuous control and Atari RAM games.

Exploration in complex domains is a key challenge in reinforcement learning, especially for tasks with very sparse rewards. Recent successes in deep reinforcement learning have been achieved mostly using simple heuristic exploration strategies such as $ε$-greedy action selection or Gaussian control noise, but there are many tasks where these methods are insufficient to make any learning progress. Here, we consider more complex heuristics: efficient and scalable exploration strategies that maximize a notion of an agent's surprise about its experiences via intrinsic motivation. We propose to learn a model of the MDP transition probabilities concurrently with the policy, and to form intrinsic rewards that approximate the KL-divergence of the true transition probabilities from the learned model. One of our approximations results in using surprisal as intrinsic motivation, while the other gives the $k$-step learning progress. We show that our incentives enable agents to succeed in a wide range of environments with high-dimensional state spaces and very sparse rewards, including continuous control tasks and games in the Atari RAM domain, outperforming several other heuristic exploration techniques.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes