MonoNPHM: Dynamic Head Reconstruction from Monocular Videos
This work addresses the problem of creating easily accessible neural parametric face models for researchers and practitioners in computer vision and graphics, representing an incremental improvement in dynamic face reconstruction from monocular videos.
The paper tackles dynamic 3D head reconstruction from monocular RGB videos by proposing MonoNPHM, a method that uses a latent appearance space and hyper-dimensional deformation fields to improve geometry and color representation, achieving significant performance gains over baselines on 20 challenging Kinect sequences.
We present Monocular Neural Parametric Head Models (MonoNPHM) for dynamic 3D head reconstructions from monocular RGB videos. To this end, we propose a latent appearance space that parameterizes a texture field on top of a neural parametric model. We constrain predicted color values to be correlated with the underlying geometry such that gradients from RGB effectively influence latent geometry codes during inverse rendering. To increase the representational capacity of our expression space, we augment our backward deformation field with hyper-dimensions, thus improving color and geometry representation in topologically challenging expressions. Using MonoNPHM as a learned prior, we approach the task of 3D head reconstruction using signed distance field based volumetric rendering. By numerically inverting our backward deformation field, we incorporated a landmark loss using facial anchor points that are closely tied to our canonical geometry representation. To evaluate the task of dynamic face reconstruction from monocular RGB videos we record 20 challenging Kinect sequences under casual conditions. MonoNPHM outperforms all baselines with a significant margin, and makes an important step towards easily accessible neural parametric face models through RGB tracking.