CVMar 5, 2023
Text2Face: A Multi-Modal 3D Face ModelWill Rowan, Patrik Huber, Nick Pears et al.
We present the first 3D morphable modelling approach, whereby 3D face shape can be directly and completely defined using a textual prompt. Building on work in multi-modal learning, we extend the FLAME head model to a common image-and-text latent space. This allows for direct 3D Morphable Model (3DMM) parameter generation and therefore shape manipulation from textual descriptions. Our method, Text2Face, has many applications; for example: generating police photofits where the input is already in natural language. It further enables multi-modal 3DMM image fitting to sketches and sculptures, as well as images.
CVJul 25, 2023
Fake It Without Making It: Conditioned Face Generation for Accurate 3D Face ReconstructionWill Rowan, Patrik Huber, Nick Pears et al.
Accurate 3D face reconstruction from 2D images is an enabling technology with applications in healthcare, security, and creative industries. However, current state-of-the-art methods either rely on supervised training with very limited 3D data or self-supervised training with 2D image data. To bridge this gap, we present a method to generate a large-scale synthesised dataset of 250K photorealistic images and their corresponding shape parameters and depth maps, which we call SynthFace. Our synthesis method conditions Stable Diffusion on depth maps sampled from the FLAME 3D Morphable Model (3DMM) of the human face, allowing us to generate a diverse set of shape-consistent facial images that is designed to be balanced in race and gender. We further propose ControlFace, a deep neural network, trained on SynthFace, which achieves competitive performance on the NoW benchmark, without requiring 3D supervision or manual 3D asset creation. The complete SynthFace dataset will be made publicly available upon publication.