CVDec 10, 2023

I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object Interactions

arXiv:2312.08869v243 citationsCVPR
Originality Incremental advance
AI Analysis

It addresses the challenge of conveniently capturing interactions between humans and smart devices, which is incremental by building on existing motion capture techniques with a novel hybrid sensor setup.

The paper tackles the problem of capturing 3D human-object interactions using only a monocular RGB camera and object-mounted IMU, achieving faithful motion recovery through a method that combines general motion inference and category-aware refinement.

We are living in a world surrounded by diverse and "smart" devices with rich modalities of sensing ability. Conveniently capturing the interactions between us humans and these objects remains far-reaching. In this paper, we present I'm-HOI, a monocular scheme to faithfully capture the 3D motions of both the human and object in a novel setting: using a minimal amount of RGB camera and object-mounted Inertial Measurement Unit (IMU). It combines general motion inference and category-aware refinement. For the former, we introduce a holistic human-object tracking method to fuse the IMU signals and the RGB stream and progressively recover the human motions and subsequently the companion object motions. For the latter, we tailor a category-aware motion diffusion model, which is conditioned on both the raw IMU observations and the results from the previous stage under over-parameterization representation. It significantly refines the initial results and generates vivid body, hand, and object motions. Moreover, we contribute a large dataset with ground truth human and object motions, dense RGB inputs, and rich object-mounted IMU measurements. Extensive experiments demonstrate the effectiveness of I'm-HOI under a hybrid capture setting. Our dataset and code will be released to the community.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes