AILGJul 11, 2017

Value Prediction Network

arXiv:1707.03497v2350 citations
AI Analysis

This addresses the problem of efficient planning in reinforcement learning for AI agents, offering a novel hybrid approach that is incremental in combining existing methods.

The paper tackles the challenge of integrating model-free and model-based reinforcement learning by proposing the Value Prediction Network (VPN), which learns a dynamics model to predict future values instead of observations, resulting in advantages over baselines in stochastic environments and outperforming DQN on several Atari games with short-lookahead planning.

This paper proposes a novel deep reinforcement learning (RL) architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network. In contrast to typical model-based RL methods, VPN learns a dynamics model whose abstract states are trained to make option-conditional predictions of future values (discounted sum of rewards) rather than of future observations. Our experimental results show that VPN has several advantages over both model-free and model-based baselines in a stochastic environment where careful planning is required but building an accurate observation-prediction model is difficult. Furthermore, VPN outperforms Deep Q-Network (DQN) on several Atari games even with short-lookahead planning, demonstrating its potential as a new way of learning a good state representation.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes