One-Policy-Fits-All: Geometry-Aware Action Latents for Cross-Embodiment Manipulation
This addresses the challenge of scaling robot manipulation and reducing data collection costs for robotics researchers and practitioners, though it is an incremental advancement in multi-embodiment learning.
The paper tackles the problem of cross-embodiment manipulation by proposing the One-Policy-Fits-All (OPFA) framework, which learns a single policy across multiple robot embodiments, resulting in improvements such as over 50% higher success rates compared to single-source training and achieving comparable performance with only eight demonstrations instead of 72.
Cross-embodiment manipulation is crucial for enhancing the scalability of robot manipulation and reducing the high cost of data collection. However, the significant differences between embodiments, such as variations in action spaces and structural disparities, pose challenges for joint training across multiple sources of data. To address this, we propose One-Policy-Fits-All (OPFA), a framework that enables learning a single, versatile policy across multiple embodiments. We first learn a Geometry-Aware Latent Representation (GaLR), which leverages 3D convolution networks and transformers to build a shared latent action space across different embodiments. Then we design a unified latent retargeting decoder that extracts embodiment-specific actions from the latent representations, without any embodiment-specific decoder tuning. OPFA enables end-to-end co-training of data from diverse embodiments, including various grippers and dexterous hands with arbitrary degrees of freedom, significantly improving data efficiency and reducing the cost of skill transfer. We conduct extensive experiments across 11 different end-effectors. The results demonstrate that OPFA significantly improves policy performance in diverse settings by leveraging heterogeneous embodiment data. For instance, cross-embodiment co-training can improve success rates by more than 50% compared to single-source training. Moreover, by adding only a few demonstrations from a new embodiment (e.g., eight), OPFA can achieve performance comparable to that of a well-trained model with 72 demonstrations.