Vision-based Teleoperation of Shadow Dexterous Hand using End-to-End Deep Neural Network
This work addresses the challenge of enabling novice users to teleoperate complex robotic hands for tasks like grasping, though it is incremental as it builds on existing vision-based teleoperation methods.
The paper tackles the problem of intuitive, markerless vision-based teleoperation of dexterous robotic hands by developing TeachNet, an end-to-end deep neural network that directly generates robot joint angles from depth images of the human hand, resulting in more reliable and faster performance than state-of-the-art methods in imitation experiments and grasp tasks.
In this paper, we present TeachNet, a novel neural network architecture for intuitive and markerless vision-based teleoperation of dexterous robotic hands. Robot joint angles are directly generated from depth images of the human hand that produce visually similar robot hand poses in an end-to-end fashion. The special structure of TeachNet, combined with a consistency loss function, handles the differences in appearance and anatomy between human and robotic hands. A synchronized human-robot training set is generated from an existing dataset of labeled depth images of the human hand and simulated depth images of a robotic hand. The final training set includes 400K pairwise depth images and joint angles of a Shadow C6 robotic hand. The network evaluation results verify the superiority of TeachNet, especially regarding the high-precision condition. Imitation experiments and grasp tasks teleoperated by novice users demonstrate that TeachNet is more reliable and faster than the state-of-the-art vision-based teleoperation method.