LGAIROJun 18, 2021

Learning to Plan via a Multi-Step Policy Regression Method

arXiv:2106.10075v1
Originality Incremental advance
AI Analysis

This addresses inference performance issues for AI systems in sequential decision-making tasks, but it is incremental as it builds on existing policy distillation and A2C methods.

The paper tackles the problem of slow inference in environments requiring action sequences, such as mazes, by proposing a multi-step policy regression method that predicts n actions ahead, resulting in drastic speedup during inference time on MiniGrid and Pong environments.

We propose a new approach to increase inference performance in environments that require a specific sequence of actions in order to be solved. This is for example the case for maze environments where ideally an optimal path is determined. Instead of learning a policy for a single step, we want to learn a policy that can predict n actions in advance. Our proposed method called policy horizon regression (PHR) uses knowledge of the environment sampled by A2C to learn an n dimensional policy vector in a policy distillation setup which yields n sequential actions per observation. We test our method on the MiniGrid and Pong environments and show drastic speedup during inference time by successfully predicting sequences of actions on a single observation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes