AILGSep 24, 2022

Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels

arXiv:2209.12016v230 citationsh-index: 44
AI Analysis

This work addresses the problem of sample efficiency and generalization in visual control for AI agents, representing a strong specific gain in the domain of unsupervised RL.

The authors tackled the challenge of improving generalization in unsupervised reinforcement learning from visual inputs by proposing a new method combining unsupervised model-based RL pre-training with task-aware fine-tuning and a hybrid planner, achieving 93.59% overall normalized performance on the Unsupervised RL Benchmark, surpassing previous baselines.

Controlling artificial agents from visual sensory data is an arduous task. Reinforcement learning (RL) algorithms can succeed but require large amounts of interactions between the agent and the environment. To alleviate the issue, unsupervised RL proposes to employ self-supervised interaction and learning, for adapting faster to future tasks. Yet, as shown in the Unsupervised RL Benchmark (URLB; Laskin et al. 2021), whether current unsupervised strategies can improve generalization capabilities is still unclear, especially in visual control settings. In this work, we study the URLB and propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent, and a task-aware fine-tuning strategy combined with a new proposed hybrid planner, Dyna-MPC, to adapt the agent for downstream tasks. On URLB, our method obtains 93.59% overall normalized performance, surpassing previous baselines by a staggering margin. The approach is empirically evaluated through a large-scale empirical study, which we use to validate our design choices and analyze our models. We also show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation. Project website: https://masteringurlb.github.io/

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes