ROAICVApr 8

RoSHI: A Versatile Robot-oriented Suit for Human Data In-the-Wild

arXiv:2604.0733171.7
AI Analysis

This addresses the need for scalable robot learning data collection in real-world settings, though it is incremental as it builds on existing sensor fusion methods.

The paper tackles the problem of collecting rich, long-horizon human motion data in the wild by introducing RoSHI, a hybrid wearable system that fuses IMUs with egocentric glasses to estimate 3D pose and body shape, outperforming other egocentric baselines and performing comparably to a state-of-the-art exocentric baseline (SAM3D).

Scaling up robot learning will likely require human data containing rich and long-horizon interactions in the wild. Existing approaches for collecting such data trade off portability, robustness to occlusion, and global consistency. We introduce RoSHI, a hybrid wearable that fuses low-cost sparse IMUs with the Project Aria glasses to estimate the full 3D pose and body shape of the wearer in a metric global coordinate frame from egocentric perception. This system is motivated by the complementarity of the two sensors: IMUs provide robustness to occlusions and high-speed motions, while egocentric SLAM anchors long-horizon motion and stabilizes upper body pose. We collect a dataset of agile activities to evaluate RoSHI. On this dataset, we generally outperform other egocentric baselines and perform comparably to a state-of-the-art exocentric baseline (SAM3D). Finally, we demonstrate that the motion data recorded from our system are suitable for real-world humanoid policy learning. For videos, data and more, visit the project webpage: https://roshi-mocap.github.io/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes