AILGMay 28, 2018

Value Propagation Networks

arXiv:1805.11199v228 citations
Originality Incremental advance
AI Analysis

This provides a cost-efficient learning system for building low-level planners in interactive navigation problems, though it appears incremental as an extension of differentiable planning methods.

The paper tackles the problem of learning to plan in unseen tasks and generalizing to larger environments by introducing Value Propagation (VProp), a set of parameter-efficient differentiable planning modules based on Value Iteration, which achieved successful navigation in static and dynamic MazeBase grid-worlds and a StarCraft scenario.

We present Value Propagation (VProp), a set of parameter-efficient differentiable planning modules built on Value Iteration which can successfully be trained using reinforcement learning to solve unseen tasks, has the capability to generalize to larger map sizes, and can learn to navigate in dynamic environments. We show that the modules enable learning to plan when the environment also includes stochastic elements, providing a cost-efficient learning system to build low-level size-invariant planners for a variety of interactive navigation problems. We evaluate on static and dynamic configurations of MazeBase grid-worlds, with randomly generated environments of several different sizes, and on a StarCraft navigation scenario, with more complex dynamics, and pixels as input.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes