LGMay 23, 2024

PEAC: Unsupervised Pre-training for Cross-Embodiment Reinforcement Learning

Chengyang Ying, Zhongkai Hao, Xinning Zhou, Xuezhou Xu, Hang Su, Xingxing Zhang, Jun Zhu

Tsinghua

arXiv:2405.14073v215.014 citationsh-index: 18Has CodeNIPS

Originality Highly original

AI Analysis

This work addresses the challenge of deploying RL agents across diverse real-world embodiments, representing a novel approach rather than an incremental improvement.

The paper tackles the problem of cross-embodiment generalization in reinforcement learning by introducing Cross-Embodiment Unsupervised RL (CEURL) and the PEAC algorithm, which uses unsupervised pre-training to acquire embodiment-aware knowledge, resulting in significant improvements in adaptation performance and generalization across simulated and real-world environments.

Designing generalizable agents capable of adapting to diverse embodiments has achieved significant attention in Reinforcement Learning (RL), which is critical for deploying RL agents in various real-world applications. Previous Cross-Embodiment RL approaches have focused on transferring knowledge across embodiments within specific tasks. These methods often result in knowledge tightly coupled with those tasks and fail to adequately capture the distinct characteristics of different embodiments. To address this limitation, we introduce the notion of Cross-Embodiment Unsupervised RL (CEURL), which leverages unsupervised learning to enable agents to acquire embodiment-aware and task-agnostic knowledge through online interactions within reward-free environments. We formulate CEURL as a novel Controlled Embodiment Markov Decision Process (CE-MDP) and systematically analyze CEURL's pre-training objectives under CE-MDP. Based on these analyses, we develop a novel algorithm Pre-trained Embodiment-Aware Control (PEAC) for handling CEURL, incorporating an intrinsic reward function specifically designed for cross-embodiment pre-training. PEAC not only provides an intuitive optimization strategy for cross-embodiment pre-training but also can integrate flexibly with existing unsupervised RL methods, facilitating cross-embodiment exploration and skill discovery. Extensive experiments in both simulated (e.g., DMC and Robosuite) and real-world environments (e.g., legged locomotion) demonstrate that PEAC significantly improves adaptation performance and cross-embodiment generalization, demonstrating its effectiveness in overcoming the unique challenges of CEURL. The project page and code are in https://yingchengyang.github.io/ceurl.

View on arXiv PDF Code

Similar