AI LGNov 19, 2019

Attention-Privileged Reinforcement Learning

Sasha Salter, Dushyant Rao, Markus Wulfmeier, Raia Hadsell, Ingmar Posner

arXiv:1911.08363v311.98 citations

Originality Incremental advance

AI Analysis

This addresses sample efficiency and robustness issues in reinforcement learning for visual tasks, but it is incremental as it builds on existing methods like visual domain randomization.

The paper tackles the problem of poor sample efficiency and generalization in image-based reinforcement learning by introducing APRiL, which uses a self-supervised attention mechanism to focus on task-relevant aspects, resulting in accelerated learning and improved performance on diverse domains.

Image-based Reinforcement Learning is known to suffer from poor sample efficiency and generalisation to unseen visuals such as distractors (task-independent aspects of the observation space). Visual domain randomisation encourages transfer by training over visual factors of variation that may be encountered in the target domain. This increases learning complexity, can negatively impact learning rate and performance, and requires knowledge of potential variations during deployment. In this paper, we introduce Attention-Privileged Reinforcement Learning (APRiL) which uses a self-supervised attention mechanism to significantly alleviate these drawbacks: by focusing on task-relevant aspects of the observations, attention provides robustness to distractors as well as significantly increased learning efficiency. APRiL trains two attention-augmented actor-critic agents: one purely based on image observations, available across training and transfer domains; and one with access to privileged information (such as environment states) available only during training. Experience is shared between both agents and their attention mechanisms are aligned. The image-based policy can then be deployed without access to privileged information. We experimentally demonstrate accelerated and more robust learning on a diverse set of domains, leading to improved final performance for environments both within and outside the training distribution.

View on arXiv PDF

Similar