LGAIApr 30, 2020

Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning

arXiv:2004.14646v1156 citations
AI Analysis

This addresses the challenge of building robust representations for multitask and partially observable environments in deep reinforcement learning, offering a flexible method that can handle multimodal inputs like images and language.

The paper tackles the problem of learning effective representations for multitask reinforcement learning by introducing Prediction of Bootstrap Latents (PBL), a self-supervised algorithm that predicts latent embeddings of future observations to capture environment dynamics, resulting in improved performance over state-of-the-art agents in DMLab-30 and Atari-57 settings.

Learning a good representation is an essential component for deep reinforcement learning (RL). Representation learning is especially important in multitask and partially observable settings where building a representation of the unknown environment is crucial to solve the tasks. Here we introduce Prediction of Bootstrap Latents (PBL), a simple and flexible self-supervised representation learning algorithm for multitask deep RL. PBL builds on multistep predictive representations of future observations, and focuses on capturing structured information about environment dynamics. Specifically, PBL trains its representation by predicting latent embeddings of future observations. These latent embeddings are themselves trained to be predictive of the aforementioned representations. These predictions form a bootstrapping effect, allowing the agent to learn more about the key aspects of the environment dynamics. In addition, by defining prediction tasks completely in latent space, PBL provides the flexibility of using multimodal observations involving pixel images, language instructions, rewards and more. We show in our experiments that PBL delivers across-the-board improved performance over state of the art deep RL agents in the DMLab-30 and Atari-57 multitask setting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes