CVNov 21, 2019

FLNet: Landmark Driven Fetching and Learning Network for Faithful Talking Facial Animation Synthesis

arXiv:1911.09224v163 citationsHas Code
Originality Incremental advance
AI Analysis

This work improves talking face synthesis for applications like video generation or virtual avatars, but it is incremental as it builds on existing warping-based methods by using multiple source images.

The paper tackles the problem of generating faithful talking facial animations by addressing the inability of previous single-image methods to synthesize hidden facial regions like eyes or teeth, achieving higher performance both quantitatively and qualitatively compared to baseline algorithms.

Talking face synthesis has been widely studied in either appearance-based or warping-based methods. Previous works mostly utilize single face image as a source, and generate novel facial animations by merging other person's facial features. However, some facial regions like eyes or teeth, which may be hidden in the source image, can not be synthesized faithfully and stably. In this paper, We present a landmark driven two-stream network to generate faithful talking facial animation, in which more facial details are created, preserved and transferred from multiple source images instead of a single one. Specifically, we propose a network consisting of a learning and fetching stream. The fetching sub-net directly learns to attentively warp and merge facial regions from five source images of distinctive landmarks, while the learning pipeline renders facial organs from the training face space to compensate. Compared to baseline algorithms, extensive experiments demonstrate that the proposed method achieves a higher performance both quantitatively and qualitatively. Codes are at https://github.com/kgu3/FLNet_AAAI2020.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes