From 2D Images to 3D Model:Weakly Supervised Multi-View Face Reconstruction with Deep Fusion
This addresses the challenge of creating accurate 3D face models from 2D images with limited supervision, which is useful for applications in computer vision and graphics, though it appears incremental as it builds on existing weakly supervised MVR approaches.
The paper tackles the problem of weakly supervised multi-view face reconstruction by proposing DF-MVR, a pipeline that fuses features from multiple images to reconstruct 3D faces without 3D annotations, achieving 5.2% and 3.0% RMSE improvements over existing methods on two datasets.
While weakly supervised multi-view face reconstruction (MVR) is garnering increased attention, one critical issue still remains open: how to effectively interact and fuse multiple image information to reconstruct high-precision 3D models. In this regard, we propose a novel pipeline called Deep Fusion MVR (DF-MVR) to explore the feature correspondences between multi-view images and reconstruct high-precision 3D faces. Specifically, we present a novel multi-view feature fusion backbone that utilizes face masks to align features from multiple encoders and integrates one multi-layer attention mechanism to enhance feature interaction and fusion, resulting in one unified facial representation. Additionally, we develop one concise face mask mechanism that facilitates multi-view feature fusion and facial reconstruction by identifying common areas and guiding the network's focus on critical facial features (e.g., eyes, brows, nose, and mouth). Experiments on Pixel-Face and Bosphorus datasets indicate the superiority of our model. Without 3D annotation, DF-MVR achieves 5.2% and 3.0% RMSE improvement over the existing weakly supervised MVRs respectively on Pixel-Face and Bosphorus dataset. Code will be available publicly at https://github.com/weiguangzhao/DF_MVR.