LGAIDec 8, 2022

Learning Options via Compression

arXiv:2212.04590v122 citationsh-index: 93
Originality Incremental advance
AI Analysis

This work addresses the underspecification issue in skill learning for multi-task reinforcement learning, offering a method that improves sample efficiency and scalability, though it is incremental as it builds on existing latent variable models.

The paper tackles the problem of degenerate solutions in skill learning for multi-task reinforcement learning by proposing a new objective that combines maximum likelihood with a penalty on skill description length, resulting in skills that solve downstream tasks in fewer samples and scale to high-dimensional image observations.

Identifying statistical regularities in solutions to some tasks in multi-task reinforcement learning can accelerate the learning of new tasks. Skill learning offers one way of identifying these regularities by decomposing pre-collected experiences into a sequence of skills. A popular approach to skill learning is maximizing the likelihood of the pre-collected experience with latent variable models, where the latent variables represent the skills. However, there are often many solutions that maximize the likelihood equally well, including degenerate solutions. To address this underspecification, we propose a new objective that combines the maximum likelihood objective with a penalty on the description length of the skills. This penalty incentivizes the skills to maximally extract common structures from the experiences. Empirically, our objective learns skills that solve downstream tasks in fewer samples compared to skills learned from only maximizing likelihood. Further, while most prior works in the offline multi-task setting focus on tasks with low-dimensional observations, our objective can scale to challenging tasks with high-dimensional image observations.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes