Cross-Embodiment Offline Reinforcement Learning for Heterogeneous Robot Datasets
This addresses the high cost of collecting demonstrations for robot learning by enabling pre-training across diverse platforms, though it is incremental in improving conflict resolution.
The study tackled the problem of scalable robot policy pre-training by combining offline reinforcement learning with cross-embodiment learning, using heterogeneous robot datasets, and found that this approach outperforms behavior cloning but faces challenges with conflicting gradients as data diversity increases, which was mitigated by an embodiment-based grouping strategy that improved performance.
Scalable robot policy pre-training has been hindered by the high cost of collecting high-quality demonstrations for each platform. In this study, we address this issue by uniting offline reinforcement learning (offline RL) with cross-embodiment learning. Offline RL leverages both expert and abundant suboptimal data, and cross-embodiment learning aggregates heterogeneous robot trajectories across diverse morphologies to acquire universal control priors. We perform a systematic analysis of this offline RL and cross-embodiment paradigm, providing a principled understanding of its strengths and limitations. To evaluate this offline RL and cross-embodiment paradigm, we construct a suite of locomotion datasets spanning 16 distinct robot platforms. Our experiments confirm that this combined approach excels at pre-training with datasets rich in suboptimal trajectories, outperforming pure behavior cloning. However, as the proportion of suboptimal data and the number of robot types increase, we observe that conflicting gradients across morphologies begin to impede learning. To mitigate this, we introduce an embodiment-based grouping strategy in which robots are clustered by morphological similarity and the model is updated with a group gradient. This simple, static grouping substantially reduces inter-robot conflicts and outperforms existing conflict-resolution methods.