CVDec 7, 2023

FitDiff: Robust monocular 3D facial shape and reflectance estimation using Diffusion Models

arXiv:2312.04465v35 citationsh-index: 81WACV
AI Analysis

This work addresses the need for robust and photorealistic 3D face reconstruction for applications in computer graphics and vision, representing an incremental advancement by applying diffusion models to a specific domain.

The paper tackles the problem of reconstructing relightable 3D facial avatars from a single unconstrained 2D image by introducing FitDiff, a diffusion-based model that concurrently outputs facial reflectance maps and shapes, achieving state-of-the-art performance.

The remarkable progress in 3D face reconstruction has resulted in high-detail and photorealistic facial representations. Recently, Diffusion Models have revolutionized the capabilities of generative methods by surpassing the performance of GANs. In this work, we present FitDiff, a diffusion-based 3D facial avatar generative model. Leveraging diffusion principles, our model accurately generates relightable facial avatars, utilizing an identity embedding extracted from an "in-the-wild" 2D facial image. The introduced multi-modal diffusion model is the first to concurrently output facial reflectance maps (diffuse and specular albedo and normals) and shapes, showcasing great generalization capabilities. It is solely trained on an annotated subset of a public facial dataset, paired with 3D reconstructions. We revisit the typical 3D facial fitting approach by guiding a reverse diffusion process using perceptual and face recognition losses. Being the first 3D LDM conditioned on face recognition embeddings, FitDiff reconstructs relightable human avatars, that can be used as-is in common rendering engines, starting only from an unconstrained facial image, and achieving state-of-the-art performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes