LGMLApr 4, 2018

Information Maximizing Exploration with a Latent Dynamics Model

arXiv:1804.01238v113 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of inefficient exploration in deep reinforcement learning for continuous control tasks, offering a theoretically grounded method that could enhance learning efficiency in high-dimensional environments.

The paper tackles the exploration-exploitation trade-off in reinforcement learning by introducing an approach that uses a latent dynamics model to derive reward bonuses for intrinsic motivation, improving model-free methods. The method is evaluated on continuous control tasks, showing improved exploration, though no specific performance numbers are provided.

All reinforcement learning algorithms must handle the trade-off between exploration and exploitation. Many state-of-the-art deep reinforcement learning methods use noise in the action selection, such as Gaussian noise in policy gradient methods or $ε$-greedy in Q-learning. While these methods are appealing due to their simplicity, they do not explore the state space in a methodical manner. We present an approach that uses a model to derive reward bonuses as a means of intrinsic motivation to improve model-free reinforcement learning. A key insight of our approach is that this dynamics model can be learned in the latent feature space of a value function, representing the dynamics of the agent and the environment. This method is both theoretically grounded and computationally advantageous, permitting the efficient use of Bayesian information-theoretic methods in high-dimensional state spaces. We evaluate our method on several continuous control tasks, focusing on improving exploration.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes