MLAILGMay 9, 2017

Deep Episodic Value Iteration for Model-based Meta-Reinforcement Learning

arXiv:1705.03562v15 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of fast adaptation in reinforcement learning for tasks with varying dynamics, though it appears incremental as it builds on existing model-based and meta-learning approaches.

The authors tackled the problem of meta-reinforcement learning by introducing Deep Episodic Value Iteration (DEVI), a model-based algorithm that achieves one-shot transfer to changes in reward and transition structure, even in high-dimensional state spaces.

We present a new deep meta reinforcement learner, which we call Deep Episodic Value Iteration (DEVI). DEVI uses a deep neural network to learn a similarity metric for a non-parametric model-based reinforcement learning algorithm. Our model is trained end-to-end via back-propagation. Despite being trained using the model-free Q-learning objective, we show that DEVI's model-based internal structure provides `one-shot' transfer to changes in reward and transition structure, even for tasks with very high-dimensional state spaces.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes