HumanEgo: Zero-Shot Robot Learning from Minutes of Human Egocentric Videos

Zhi, Wang, Botao He, Kelin Yu, Seungjae Lee, Ruohan Gao, Furong Huang, Yiannis Aloimonos

arXiv:2605.2493491.1

AI Analysis

This work addresses the embodiment gap for transferring human manipulation skills to robots without requiring robot data, enabling data-efficient and hardware-agnostic robot learning.

HumanEgo enables zero-shot robot learning from only minutes of human egocentric videos, achieving 92.5% average success across four real-world tasks with 30 minutes of video per task, outperforming matched-time robot teleoperation by 41%.

Human egocentric video captures rich manipulation demonstrations without any robot hardware, yet transferring these skills to robots remains challenging due to the embodiment gap between human and robot in both visual appearance and kinematics. We present HumanEgo, a framework that bridges the embodiment gap by lifting each human demonstration to an entity-level representation of hand-object interaction, and training a flow matching policy with dense auxiliary objectives that amplify supervision from every trajectory. HumanEgo is robot-data-free, hardware-agnostic, data-efficient, and zero-shot human-to-robot transferable. With only 30 minutes of human videos per task, HumanEgo achieves 92.5% average success across four real-world tasks (75% with just 15 minutes), outperforms matched-time robot teleoperation by 41%, and robustly transfers zero-shot across novel robots, cameras, and environments.

View on arXiv PDF

Similar