CVMar 20, 2018

Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks

Seyed Ali Jalalifar, Hosein Hasani, Hamid Aghajan

arXiv:1803.07461v19.927 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of speech-driven facial reenactment for applications like video conferencing or entertainment, representing an incremental improvement by combining existing techniques.

The paper tackled the problem of generating photo-realistic facial images with accurate lip sync from audio input, achieving highly-realistic results using a recurrent neural network for landmark extraction and conditional generative adversarial networks for image synthesis.

We present a novel approach to generating photo-realistic images of a face with accurate lip sync, given an audio input. By using a recurrent neural network, we achieved mouth landmarks based on audio features. We exploited the power of conditional generative adversarial networks to produce highly-realistic face conditioned on a set of landmarks. These two networks together are capable of producing a sequence of natural faces in sync with an input audio track.

View on arXiv PDF

Similar