CLAM: Continuous Latent Action Models for Robot Learning from Unlabeled Demonstrations
This addresses the bottleneck of costly action-labeled data in robot imitation learning, enabling scalable training from unlabeled observations, though it is incremental in improving existing unsupervised approaches.
The paper tackles the problem of learning robot policies from unlabeled demonstrations by proposing CLAM, a method that uses continuous latent action models and joint training to achieve a 2-3x improvement in task success rate over prior methods on benchmarks and a real robot arm.
Learning robot policies using imitation learning requires collecting large amounts of costly action-labeled expert demonstrations, which fundamentally limits the scale of training data. A promising approach to address this bottleneck is to harness the abundance of unlabeled observations-e.g., from video demonstrations-to learn latent action labels in an unsupervised way. However, we find that existing methods struggle when applied to complex robot tasks requiring fine-grained motions. We design continuous latent action models (CLAM) which incorporate two key ingredients we find necessary for learning to solve complex continuous control tasks from unlabeled observation data: (a) using continuous latent action labels instead of discrete representations, and (b) jointly training an action decoder to ensure that the latent action space can be easily grounded to real actions with relatively few labeled examples. Importantly, the labeled examples can be collected from non-optimal play data, enabling CLAM to learn performant policies without access to any action-labeled expert data. We demonstrate on continuous control benchmarks in DMControl (locomotion) and MetaWorld (manipulation), as well as on a real WidowX robot arm that CLAM significantly outperforms prior state-of-the-art methods, remarkably with a 2-3x improvement in task success rate compared to the best baseline. Videos and code can be found at clamrobot.github.io.