LGMLNov 25, 2019

Disentangled Cumulants Help Successor Representations Transfer to New Tasks

arXiv:1911.10866v115 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of slow learning and poor transfer in reinforcement learning agents, offering a principled approach for biological-like skill reuse, though it is incremental in applying disentangled representations to successor methods.

The paper tackles the problem of data-efficient knowledge transfer in reinforcement learning by learning a basis set of policies in a disentangled latent space, enabling quick adaptation to new tasks with guarantees on coverage and achieving high performance on exponentially more complex downstream tasks.

Biological intelligence can learn to solve many diverse tasks in a data efficient manner by re-using basic knowledge and skills from one task to another. Furthermore, many of such skills are acquired without explicit supervision in an intrinsically driven fashion. This is in contrast to the state-of-the-art reinforcement learning agents, which typically start learning each new task from scratch and struggle with knowledge transfer. In this paper we propose a principled way to learn a basis set of policies, which, when recombined through generalised policy improvement, come with guarantees on the coverage of the final task space. In particular, we concentrate on solving goal-based downstream tasks where the execution order of actions is not important. We demonstrate both theoretically and empirically that learning a small number of policies that reach intrinsically specified goal regions in a disentangled latent space can be re-used to quickly achieve a high level of performance on an exponentially larger number of externally specified, often significantly more complex downstream tasks. Our learning pipeline consists of two stages. First, the agent learns to perform intrinsically generated, goal-based tasks in the total absence of environmental rewards. Second, the agent leverages this experience to quickly achieve a high level of performance on numerous diverse externally specified tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes