LASOR: Learning Accurate 3D Human Pose and Shape Via Synthetic Occlusion-Aware Data and Neural Mesh Rendering
It addresses occlusion issues in 3D human pose and shape estimation for computer vision applications, representing an incremental improvement over existing methods.
The paper tackles the challenge of human pose and shape estimation under occlusions by proposing a framework that synthesizes occlusion-aware data and uses neural mesh rendering for silhouette supervision, achieving state-of-the-art results on datasets like 3DPW and SSP-3D with improved shape estimation.
A key challenge in the task of human pose and shape estimation is occlusion, including self-occlusions, object-human occlusions, and inter-person occlusions. The lack of diverse and accurate pose and shape training data becomes a major bottleneck, especially for scenes with occlusions in the wild. In this paper, we focus on the estimation of human pose and shape in the case of inter-person occlusions, while also handling object-human occlusions and self-occlusion. We propose a novel framework that synthesizes occlusion-aware silhouette and 2D keypoints data and directly regress to the SMPL pose and shape parameters. A neural 3D mesh renderer is exploited to enable silhouette supervision on the fly, which contributes to great improvements in shape estimation. In addition, keypoints-and-silhouette-driven training data in panoramic viewpoints are synthesized to compensate for the lack of viewpoint diversity in any existing dataset. Experimental results show that we are among the state-of-the-art on the 3DPW and 3DPW-Crowd datasets in terms of pose estimation accuracy. The proposed method evidently outperforms Mesh Transformer, 3DCrowdNet and ROMP in terms of shape estimation. Top performance is also achieved on SSP-3D in terms of shape prediction accuracy. Demo and code will be available at https://igame-lab.github.io/LASOR/.