LG AI MLSep 26, 2020

Graph neural induction of value iteration

Andreea Deac, Pierre-Luc Bacon, Jian Tang

arXiv:2009.12604v18.512 citations

Originality Incremental advance

AI Analysis

This work addresses the need for more flexible and directly supervised planning components in deep reinforcement learning systems, though it appears incremental by extending existing neural planning methods to broader environments.

The paper tackled the problem of incorporating explicit planning into reinforcement learning by proposing a graph neural network that directly executes the value iteration algorithm across arbitrary environments, achieving accurate modeling and favorable metrics in out-of-distribution tests.

Many reinforcement learning tasks can benefit from explicit planning based on an internal model of the environment. Previously, such planning components have been incorporated through a neural network that partially aligns with the computational graph of value iteration. Such network have so far been focused on restrictive environments (e.g. grid-worlds), and modelled the planning procedure only indirectly. We relax these constraints, proposing a graph neural network (GNN) that executes the value iteration (VI) algorithm, across arbitrary environment models, with direct supervision on the intermediate steps of VI. The results indicate that GNNs are able to model value iteration accurately, recovering favourable metrics and policies across a variety of out-of-distribution tests. This suggests that GNN executors with strong supervision are a viable component within deep reinforcement learning systems.

View on arXiv PDF

Similar