CVFeb 6, 2018

Every Smile is Unique: Landmark-Guided Diverse Smile Generation

arXiv:1802.01873v370 citations
Originality Incremental advance
AI Analysis

This addresses the one-to-many video generation challenge for facial expression synthesis, offering incremental improvements in diversity and realism for applications like entertainment or human-computer interaction.

The paper tackled the problem of generating multiple unique smile videos from a single neutral face image by proposing the Conditional Multi-Mode Network (CMM-Net), which uses facial landmarks and a variational auto-encoder to produce diverse and realistic smile sequences.

Each smile is unique: one person surely smiles in different ways (e.g., closing/opening the eyes or mouth). Given one input image of a neutral face, can we generate multiple smile videos with distinctive characteristics? To tackle this one-to-many video generation problem, we propose a novel deep learning architecture named Conditional Multi-Mode Network (CMM-Net). To better encode the dynamics of facial expressions, CMM-Net explicitly exploits facial landmarks for generating smile sequences. Specifically, a variational auto-encoder is used to learn a facial landmark embedding. This single embedding is then exploited by a conditional recurrent network which generates a landmark embedding sequence conditioned on a specific expression (e.g., spontaneous smile). Next, the generated landmark embeddings are fed into a multi-mode recurrent landmark generator, producing a set of landmark sequences still associated to the given smile class but clearly distinct from each other. Finally, these landmark sequences are translated into face videos. Our experimental results demonstrate the effectiveness of our CMM-Net in generating realistic videos of multiple smile expressions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes