ROMay 14

HoMMI: Learning Whole-Body Mobile Manipulation from Human Demonstrations

arXiv:2603.0324394.87 citations
Predicted impact top 6% in RO · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the challenge of scalable data collection and policy transfer for mobile manipulation, enabling robot learning from human demonstrations without requiring robots during data collection.

HoMMI introduces a framework for learning whole-body mobile manipulation from human demonstrations, achieving long-horizon tasks requiring bimanual coordination and navigation by bridging the embodiment gap with a cross-embodiment hand-eye policy design.

We present Whole-Body Mobile Manipulation Interface (HoMMI), a data collection and policy learning framework that learns whole-body mobile manipulation directly from robot-free human demonstrations. We augment UMI interfaces with egocentric sensing to capture the global context required for mobile manipulation, enabling portable, robot-free, and scalable data collection. However, naively incorporating egocentric sensing introduces a larger human-to-robot embodiment gap in both observation and action spaces, making policy transfer difficult. We explicitly bridge this gap with a cross-embodiment hand-eye policy design, including an embodiment agnostic visual representation; a relaxed head action representation; and a whole-body controller that realizes hand-eye trajectories through coordinated whole-body motion under robot-specific physical constraints. Together, these enable long-horizon mobile manipulation tasks requiring bimanual and whole-body coordination, navigation, and active perception. Results are best viewed on: https://hommi-robot.github.io

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes