CVOct 24, 2022

Multi-Person 3D Pose and Shape Estimation via Inverse Kinematics and Refinement

arXiv:2210.13529v222 citationsh-index: 15
Originality Incremental advance
AI Analysis

This work addresses the challenge of 3D mesh reconstruction for multiple interacting persons in computer vision, which is incremental as it builds on existing methods with novel refinements.

The paper tackles the problem of estimating 3D poses and shapes as meshes from monocular RGB images, particularly for interacting persons with occlusions, by proposing a coarse-to-fine pipeline that outperforms state-of-the-art methods on datasets like 3DPW, MuPoTS, and AGORA.

Estimating 3D poses and shapes in the form of meshes from monocular RGB images is challenging. Obviously, it is more difficult than estimating 3D poses only in the form of skeletons or heatmaps. When interacting persons are involved, the 3D mesh reconstruction becomes more challenging due to the ambiguity introduced by person-to-person occlusions. To tackle the challenges, we propose a coarse-to-fine pipeline that benefits from 1) inverse kinematics from the occlusion-robust 3D skeleton estimation and 2) Transformer-based relation-aware refinement techniques. In our pipeline, we first obtain occlusion-robust 3D skeletons for multiple persons from an RGB image. Then, we apply inverse kinematics to convert the estimated skeletons to deformable 3D mesh parameters. Finally, we apply the Transformer-based mesh refinement that refines the obtained mesh parameters considering intra- and inter-person relations of 3D meshes. Via extensive experiments, we demonstrate the effectiveness of our method, outperforming state-of-the-arts on 3DPW, MuPoTS and AGORA datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes