LG MLApr 4, 2018

Information Maximizing Exploration with a Latent Dynamics Model

Trevor Barron, Oliver Obst, Heni Ben Amor

arXiv:1804.01238v17.913 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of inefficient exploration in deep reinforcement learning for continuous control tasks, offering a theoretically grounded method that could enhance learning efficiency in high-dimensional environments.

The paper tackles the exploration-exploitation trade-off in reinforcement learning by introducing an approach that uses a latent dynamics model to derive reward bonuses for intrinsic motivation, improving model-free methods. The method is evaluated on continuous control tasks, showing improved exploration, though no specific performance numbers are provided.

All reinforcement learning algorithms must handle the trade-off between exploration and exploitation. Many state-of-the-art deep reinforcement learning methods use noise in the action selection, such as Gaussian noise in policy gradient methods or $ε$-greedy in Q-learning. While these methods are appealing due to their simplicity, they do not explore the state space in a methodical manner. We present an approach that uses a model to derive reward bonuses as a means of intrinsic motivation to improve model-free reinforcement learning. A key insight of our approach is that this dynamics model can be learned in the latent feature space of a value function, representing the dynamics of the agent and the environment. This method is both theoretically grounded and computationally advantageous, permitting the efficient use of Bayesian information-theoretic methods in high-dimensional state spaces. We evaluate our method on several continuous control tasks, focusing on improving exploration.

View on arXiv PDF

Similar