CVMar 5, 2023

Text2Face: A Multi-Modal 3D Face Model

arXiv:2303.02688v24 citationsh-index: 27
Originality Highly original
AI Analysis

This enables applications like generating police photofits from natural language descriptions, addressing a specific need in law enforcement and creative domains.

The paper tackles the problem of generating 3D face shapes from textual prompts by introducing Text2Face, a multi-modal 3D morphable model that extends the FLAME head model to a common image-and-text latent space, enabling direct 3DMM parameter generation from text.

We present the first 3D morphable modelling approach, whereby 3D face shape can be directly and completely defined using a textual prompt. Building on work in multi-modal learning, we extend the FLAME head model to a common image-and-text latent space. This allows for direct 3D Morphable Model (3DMM) parameter generation and therefore shape manipulation from textual descriptions. Our method, Text2Face, has many applications; for example: generating police photofits where the input is already in natural language. It further enables multi-modal 3DMM image fitting to sketches and sculptures, as well as images.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes