Riggable 3D Face Reconstruction via In-Network Optimization
This addresses the need for dynamic 3D face models in computer vision and graphics, offering an incremental improvement over existing methods by integrating in-network optimization.
This paper tackles the problem of reconstructing riggable 3D faces from monocular images by jointly estimating a personalized face rig and per-image parameters, achieving state-of-the-art reconstruction accuracy and enabling applications like video retargeting.
This paper presents a method for riggable 3D face reconstruction from monocular images, which jointly estimates a personalized face rig and per-image parameters including expressions, poses, and illuminations. To achieve this goal, we design an end-to-end trainable network embedded with a differentiable in-network optimization. The network first parameterizes the face rig as a compact latent code with a neural decoder, and then estimates the latent code as well as per-image parameters via a learnable optimization. By estimating a personalized face rig, our method goes beyond static reconstructions and enables downstream applications such as video retargeting. In-network optimization explicitly enforces constraints derived from the first principles, thus introduces additional priors than regression-based methods. Finally, data-driven priors from deep learning are utilized to constrain the ill-posed monocular setting and ease the optimization difficulty. Experiments demonstrate that our method achieves SOTA reconstruction accuracy, reasonable robustness and generalization ability, and supports standard face rig applications.