Vision-based Robot Manipulation Learning via Human Demonstrations
This addresses the challenge of enabling robots to learn and adapt manipulation tasks from limited demonstrations, though it appears incremental in combining existing vision and knowledge base methods.
The paper tackles the problem of generalizing vision-based robot manipulation skills from a single human demonstration to real-world interactions, achieving good generalization performance even with small training data.
Vision-based learning methods provide promise for robots to learn complex manipulation tasks. However, how to generalize the learned manipulation skills to real-world interactions remains an open question. In this work, we study robotic manipulation skill learning from a single third-person view demonstration by using activity recognition and object detection in computer vision. To facilitate generalization across objects and environments, we propose to use a prior knowledge base in the form of a text corpus to infer the object to be interacted with in the context of a robot. We evaluate our approach in a real-world robot, using several simple and complex manipulation tasks commonly performed in daily life. The experimental results show that our approach achieves good generalization performance even from small amounts of training data.