LGAIROMLMay 31, 2016

VIME: Variational Information Maximizing Exploration

arXiv:1605.09674v4297 citations
Originality Highly original
AI Analysis

This addresses the problem of effective exploration in high-dimensional deep RL for researchers and practitioners, offering a novel approach beyond simple heuristics.

The paper tackles the challenge of scalable exploration in reinforcement learning by introducing VIME, an exploration strategy based on maximizing information gain about environment dynamics, which significantly outperforms heuristic methods like epsilon-greedy across various continuous control tasks, including those with sparse rewards.

Scalable and effective exploration remains a key challenge in reinforcement learning (RL). While there are methods with optimality guarantees in the setting of discrete state and action spaces, these methods cannot be applied in high-dimensional deep RL scenarios. As such, most contemporary RL relies on simple heuristics such as epsilon-greedy exploration or adding Gaussian noise to the controls. This paper introduces Variational Information Maximizing Exploration (VIME), an exploration strategy based on maximization of information gain about the agent's belief of environment dynamics. We propose a practical implementation, using variational inference in Bayesian neural networks which efficiently handles continuous state and action spaces. VIME modifies the MDP reward function, and can be applied with several different underlying RL algorithms. We demonstrate that VIME achieves significantly better performance compared to heuristic exploration methods across a variety of continuous control tasks and algorithms, including tasks with very sparse rewards.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes