CVApr 14, 2025

HUMOTO: A 4D Dataset of Mocap Human Object Interactions

Jiaxin Lu, Chun-Hao Paul Huang, Uttaran Bhattacharya, Qixing Huang, Yi Zhou

arXiv:2504.10414v226.127 citationsh-index: 19

Originality Synthesis-oriented

AI Analysis

This dataset solves data scarcity for researchers in motion generation, computer vision, and robotics, though it is incremental as it builds on existing dataset efforts.

The researchers tackled the lack of high-fidelity data for human-object interactions by creating HUMOTO, a dataset with 735 sequences and 63 objects, which addresses key data-capturing challenges and provides benchmarks for advancing realistic modeling.

We present Human Motions with Objects (HUMOTO), a high-fidelity dataset of human-object interactions for motion generation, computer vision, and robotics applications. Featuring 735 sequences (7,875 seconds at 30 fps), HUMOTO captures interactions with 63 precisely modeled objects and 72 articulated parts. Our innovations include a scene-driven LLM scripting pipeline creating complete, purposeful tasks with natural progression, and a mocap-and-camera recording setup to effectively handle occlusions. Spanning diverse activities from cooking to outdoor picnics, HUMOTO preserves both physical accuracy and logical task flow. Professional artists rigorously clean and verify each sequence, minimizing foot sliding and object penetrations. We also provide benchmarks compared to other datasets. HUMOTO's comprehensive full-body motion and simultaneous multi-object interactions address key data-capturing challenges and provide opportunities to advance realistic human-object interaction modeling across research domains with practical applications in animation, robotics, and embodied AI systems. Project: https://jiaxin-lu.github.io/humoto/ .

View on arXiv PDF

Similar