Geo-PIFu: Geometry and Pixel Aligned Implicit Functions for Single-view Human Reconstruction
This work addresses the challenge of accurate 3D human reconstruction from images, which is important for applications in virtual reality and animation, and represents an incremental improvement over existing implicit function-based methods.
The paper tackles the problem of reconstructing a 3D mesh of a clothed person from a single monocular image, achieving a 42.7% reduction in Chamfer and Point-to-Surface distances and a 19.4% reduction in normal estimation errors compared to state-of-the-art methods.
We propose Geo-PIFu, a method to recover a 3D mesh from a monocular color image of a clothed person. Our method is based on a deep implicit function-based representation to learn latent voxel features using a structure-aware 3D U-Net, to constrain the model in two ways: first, to resolve feature ambiguities in query point encoding, second, to serve as a coarse human shape proxy to regularize the high-resolution mesh and encourage global shape regularity. We show that, by both encoding query points and constraining global shape using latent voxel features, the reconstruction we obtain for clothed human meshes exhibits less shape distortion and improved surface details compared to competing methods. We evaluate Geo-PIFu on a recent human mesh public dataset that is $10 \times$ larger than the private commercial dataset used in PIFu and previous derivative work. On average, we exceed the state of the art by $42.7\%$ reduction in Chamfer and Point-to-Surface Distances, and $19.4\%$ reduction in normal estimation errors.