HR Human: Modeling Human Avatars with Triangular Mesh and High-Resolution Textures from Videos
This addresses the need for editable, high-fidelity human avatars in graphics applications, though it appears incremental as it builds on existing implicit neural representation methods.
The paper tackles the problem of creating editable human avatars from monocular video by developing a framework that produces triangular meshes with high-resolution physically-based material textures, outperforming previous representations in fidelity while supporting deployment on common renderers.
Recently, implicit neural representation has been widely used to generate animatable human avatars. However, the materials and geometry of those representations are coupled in the neural network and hard to edit, which hinders their application in traditional graphics engines. We present a framework for acquiring human avatars that are attached with high-resolution physically-based material textures and triangular mesh from monocular video. Our method introduces a novel information fusion strategy to combine the information from the monocular video and synthesize virtual multi-view images to tackle the sparsity of the input view. We reconstruct humans as deformable neural implicit surfaces and extract triangle mesh in a well-behaved pose as the initial mesh of the next stage. In addition, we introduce an approach to correct the bias for the boundary and size of the coarse mesh extracted. Finally, we adapt prior knowledge of the latent diffusion model at super-resolution in multi-view to distill the decomposed texture. Experiments show that our approach outperforms previous representations in terms of high fidelity, and this explicit result supports deployment on common renderers.