ML AI LGMay 9, 2017

Deep Episodic Value Iteration for Model-based Meta-Reinforcement Learning

arXiv:1705.03562v12.65 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of fast adaptation in reinforcement learning for tasks with varying dynamics, though it appears incremental as it builds on existing model-based and meta-learning approaches.

The authors tackled the problem of meta-reinforcement learning by introducing Deep Episodic Value Iteration (DEVI), a model-based algorithm that achieves one-shot transfer to changes in reward and transition structure, even in high-dimensional state spaces.

We present a new deep meta reinforcement learner, which we call Deep Episodic Value Iteration (DEVI). DEVI uses a deep neural network to learn a similarity metric for a non-parametric model-based reinforcement learning algorithm. Our model is trained end-to-end via back-propagation. Despite being trained using the model-free Q-learning objective, we show that DEVI's model-based internal structure provides `one-shot' transfer to changes in reward and transition structure, even for tasks with very high-dimensional state spaces.

View on arXiv PDF

Similar