MoCapDeform: Monocular 3D Human Motion Capture in Deformable Scenes
It addresses a challenging, ill-posed problem for computer vision and robotics, enabling more accurate human pose estimation in complex, deformable environments, though it appears incremental as it builds on prior motion capture work by adding deformation modeling.
The paper tackles the problem of 3D human motion capture from monocular RGB images in deformable scenes, proposing MoCapDeform, which explicitly models non-rigid scene deformations and achieves superior accuracy compared to existing methods on multiple datasets.
3D human motion capture from monocular RGB images respecting interactions of a subject with complex and possibly deformable environments is a very challenging, ill-posed and under-explored problem. Existing methods address it only weakly and do not model possible surface deformations often occurring when humans interact with scene surfaces. In contrast, this paper proposes MoCapDeform, i.e., a new framework for monocular 3D human motion capture that is the first to explicitly model non-rigid deformations of a 3D scene for improved 3D human pose estimation and deformable environment reconstruction. MoCapDeform accepts a monocular RGB video and a 3D scene mesh aligned in the camera space. It first localises a subject in the input monocular video along with dense contact labels using a new raycasting based strategy. Next, our human-environment interaction constraints are leveraged to jointly optimise global 3D human poses and non-rigid surface deformations. MoCapDeform achieves superior accuracy than competing methods on several datasets, including our newly recorded one with deforming background scenes.