CVLGIVOct 10, 2021

Self-Supervised 3D Face Reconstruction via Conditional Estimation

arXiv:2110.04800v125 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of 3D face reconstruction for computer vision applications, but it is incremental as it builds on existing analysis-by-synthesis methods with novel conditioning and training strategies.

The paper tackles the problem of learning 3D facial parameters from 2D single-view images without labeled data by proposing a self-supervised conditional estimation framework, achieving effective results as shown in qualitative and quantitative experiments.

We present a conditional estimation (CEST) framework to learn 3D facial parameters from 2D single-view images by self-supervised training from videos. CEST is based on the process of analysis by synthesis, where the 3D facial parameters (shape, reflectance, viewpoint, and illumination) are estimated from the face image, and then recombined to reconstruct the 2D face image. In order to learn semantically meaningful 3D facial parameters without explicit access to their labels, CEST couples the estimation of different 3D facial parameters by taking their statistical dependency into account. Specifically, the estimation of any 3D facial parameter is not only conditioned on the given image, but also on the facial parameters that have already been derived. Moreover, the reflectance symmetry and consistency among the video frames are adopted to improve the disentanglement of facial parameters. Together with a novel strategy for incorporating the reflectance symmetry and consistency, CEST can be efficiently trained with in-the-wild video clips. Both qualitative and quantitative experiments demonstrate the effectiveness of CEST.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes