ROApr 27

Learning Human-Intention Priors from Large-Scale Human Demonstrations for Robotic Manipulation

arXiv:2604.2468193.5
AI Analysis

For robotic manipulation, this work provides a method to extract and transfer human-intention priors from diverse human videos to robot control, addressing the challenge of embodiment mismatch.

The paper tackles the problem of leveraging large-scale human demonstration videos for robot learning by introducing MoT-HRA, a hierarchical framework that learns human-intention priors. It achieves improved motion plausibility and robust control under distribution shift, with a 2.2M-episode dataset (HA-2.2M) curated from heterogeneous human videos.

Human videos contain rich manipulation priors, but using them for robot learning remains difficult because raw observations entangle scene understanding, human motion, and embodiment-specific action. We introduce MoT-HRA, a hierarchical vision-language-action framework that learns human-intention priors from large-scale human demonstrations. We first curate HA-2.2M, a 2.2M-episode action-language dataset reconstructed from heterogeneous human videos through hand-centric filtering, spatial reconstruction, temporal segmentation, and language alignment. On top of this dataset, MoT-HRA factorizes manipulation into three coupled experts: a vision-language expert predicts an embodiment-agnostic 3D trajectory, an intention expert models MANO-style hand motion as a latent human-motion prior, and a fine expert maps the intention-aware representation to robot action chunks. A shared-attention trunk and read-only key-value transfer allow downstream control to use human priors while limiting interference with upstream representations. Experiments on hand motion generation, simulated manipulation, and real-world robot tasks show that MoT-HRA improves motion plausibility and robust control under distribution shift.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes