How to Train Your Robots? The Impact of Demonstration Modality on Imitation Learning
This work addresses the problem of efficient data collection for robot imitation learning, offering a practical solution for researchers and practitioners, though it is incremental in optimizing existing methods.
The study investigated how different robot demonstration modalities affect imitation learning performance and user experience, finding that kinesthetic teaching provides the cleanest data for best learning results but is physically demanding for large-scale collection.
Imitation learning is a promising approach for learning robot policies with user-provided data. The way demonstrations are provided, i.e., demonstration modality, influences the quality of the data. While existing research shows that kinesthetic teaching (physically guiding the robot) is preferred by users for the intuitiveness and ease of use, the majority of existing manipulation datasets were collected through teleoperation via a VR controller or spacemouse. In this work, we investigate how different demonstration modalities impact downstream learning performance as well as user experience. Specifically, we compare low-cost demonstration modalities including kinesthetic teaching, teleoperation with a VR controller, and teleoperation with a spacemouse controller. We experiment with three table-top manipulation tasks with different motion constraints. We evaluate and compare imitation learning performance using data from different demonstration modalities, and collected subjective feedback on user experience. Our results show that kinesthetic teaching is rated the most intuitive for controlling the robot and provides cleanest data for best downstream learning performance. However, it is not preferred as the way for large-scale data collection due to the physical load. Based on such insight, we propose a simple data collection scheme that relies on a small number of kinesthetic demonstrations mixed with data collected through teleoperation to achieve the best overall learning performance while maintaining low data-collection effort.