LGAIFeb 11, 2023

Cross-domain Random Pre-training with Prototypes for Reinforcement Learning

arXiv:2302.05614v514 citationsh-index: 20
Originality Highly original
AI Analysis

This work addresses the problem of efficient and effective pre-training for reinforcement learning across diverse domains, such as balance control and robot locomotion, offering a novel approach that reduces pre-training burden without requiring extra exploration agents.

The paper tackles the challenge of unsupervised cross-domain reinforcement learning pre-training for continuous visual control by proposing CRPTpro, a framework that decouples data sampling and encoder pre-training, achieving state-of-the-art performance on 11 out of 12 downstream tasks with 54.5% of the wall-clock pre-training time compared to the next best method.

This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. Unsupervised cross-domain Reinforcement Learning (RL) pre-training shows great potential for challenging continuous visual control but poses a big challenge. In this paper, we propose \textbf{C}ross-domain \textbf{R}andom \textbf{P}re-\textbf{T}raining with \textbf{pro}totypes (CRPTpro), a novel, efficient, and effective self-supervised cross-domain RL pre-training framework. CRPTpro decouples data sampling from encoder pre-training, proposing decoupled random collection to easily and quickly generate a qualified cross-domain pre-training dataset. Moreover, a novel prototypical self-supervised algorithm is proposed to pre-train an effective visual encoder that is generic across different domains. Without finetuning, the cross-domain encoder can be implemented for challenging downstream tasks defined in different domains, either seen or unseen. Compared with recent advanced methods, CRPTpro achieves better performance on downstream policy learning without extra training on exploration agents for data collection, greatly reducing the burden of pre-training. We conduct extensive experiments across eight challenging continuous visual-control domains, including balance control, robot locomotion, and manipulation. CRPTpro significantly outperforms the next best Proto-RL(C) on 11/12 cross-domain downstream tasks with only 54.5\% wall-clock pre-training time, exhibiting state-of-the-art pre-training performance with greatly improved pre-training efficiency.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes