Self-supervised 3D Human Mesh Recovery from Noisy Point Clouds
This addresses the challenge of reliable 3D human mesh recovery in noisy environments, which is important for applications like robotics and AR/VR, though it is incremental as it builds on existing self-supervised methods.
The paper tackles the problem of reconstructing human shape and pose from noisy point clouds by introducing a self-supervised approach that models the input as a Gaussian Mixture Model to handle noise and outliers, achieving superior performance over state-of-the-art methods on synthetic and real datasets like CMU Panoptic.
This paper presents a novel self-supervised approach to reconstruct human shape and pose from noisy point cloud data. Relying on large amount of dataset with ground-truth annotations, recent learning-based approaches predict correspondences for every vertice on the point cloud; Chamfer distance is usually used to minimize the distance between a deformed template model and the input point cloud. However, Chamfer distance is quite sensitive to noise and outliers, thus could be unreliable to assign correspondences. To address these issues, we model the probability distribution of the input point cloud as generated from a parametric human model under a Gaussian Mixture Model. Instead of explicitly aligning correspondences, we treat the process of correspondence search as an implicit probabilistic association by updating the posterior probability of the template model given the input. A novel self-supervised loss is further derived which penalizes the discrepancy between the deformed template and the input point cloud conditioned on the posterior probability. Our approach is very flexible, which works with both complete point cloud and incomplete ones including even a single depth image as input. Compared to previous self-supervised methods, our method shows the capability to deal with substantial noise and outliers. Extensive experiments conducted on various public synthetic datasets as well as a very noisy real dataset (i.e. CMU Panoptic) demonstrate the superior performance of our approach over the state-of-the-art methods.