CVDec 20, 2022

Full-Body Articulated Human-Object Interaction

Nan Jiang, Tengyu Liu, Zhexuan Cao, Jieming Cui, Zhiyuan zhang, Yixin Chen, He Wang, Yixin Zhu, Siyuan Huang

Peking U

arXiv:2212.10621v326.885 citationsh-index: 32Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of fine-grained 3D human-object interaction understanding for applications like action recognition and scene reconstruction, though it is incremental in advancing beyond rigid object assumptions.

The paper tackles the problem of full-body articulated human-object interaction (f-AHOI) by introducing CHAIRS, a large-scale dataset with 16.2 hours of motion-captured interactions, and a model that uses human pose estimation to estimate articulated object poses and shapes, significantly outperforming baselines in object pose estimation tasks.

Fine-grained capturing of 3D HOI boosts human activity understanding and facilitates downstream visual tasks, including action recognition, holistic scene reconstruction, and human motion synthesis. Despite its significance, existing works mostly assume that humans interact with rigid objects using only a few body parts, limiting their scope. In this paper, we address the challenging problem of f-AHOI, wherein the whole human bodies interact with articulated objects, whose parts are connected by movable joints. We present CHAIRS, a large-scale motion-captured f-AHOI dataset, consisting of 16.2 hours of versatile interactions between 46 participants and 81 articulated and rigid sittable objects. CHAIRS provides 3D meshes of both humans and articulated objects during the entire interactive process, as well as realistic and physically plausible full-body interactions. We show the value of CHAIRS with object pose estimation. By learning the geometrical relationships in HOI, we devise the very first model that leverage human pose estimation to tackle the estimation of articulated object poses and shapes during whole-body interactions. Given an image and an estimated human pose, our model first reconstructs the pose and shape of the object, then optimizes the reconstruction according to a learned interaction prior. Under both evaluation settings (e.g., with or without the knowledge of objects' geometries/structures), our model significantly outperforms baselines. We hope CHAIRS will promote the community towards finer-grained interaction understanding. We will make the data/code publicly available.

View on arXiv PDF Code

Similar