CVAIMar 25, 2022

Reinforcement Learning with Action-Free Pre-Training from Videos

arXiv:2203.13880v2154 citationsh-index: 164Has Code
Originality Incremental advance
AI Analysis

This addresses the challenge of sample inefficiency in vision-based RL for robotics and AI applications, representing an incremental advancement by adapting pre-training methods from other domains.

The paper tackles the problem of improving sample efficiency in vision-based reinforcement learning by introducing an unsupervised pre-training framework that learns representations from videos, which significantly enhances final performance and sample-efficiency across manipulation and locomotion tasks.

Recent unsupervised pre-training methods have shown to be effective on language and vision domains by learning useful representations for multiple downstream tasks. In this paper, we investigate if such unsupervised pre-training methods can also be effective for vision-based reinforcement learning (RL). To this end, we introduce a framework that learns representations useful for understanding the dynamics via generative pre-training on videos. Our framework consists of two phases: we pre-train an action-free latent video prediction model, and then utilize the pre-trained representations for efficiently learning action-conditional world models on unseen environments. To incorporate additional action inputs during fine-tuning, we introduce a new architecture that stacks an action-conditional latent prediction model on top of the pre-trained action-free prediction model. Moreover, for better exploration, we propose a video-based intrinsic bonus that leverages pre-trained representations. We demonstrate that our framework significantly improves both final performances and sample-efficiency of vision-based RL in a variety of manipulation and locomotion tasks. Code is available at https://github.com/younggyoseo/apv.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes