LGASMLDec 12, 2019

Speech-driven facial animation using polynomial fusion of features

arXiv:1912.05833v27 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of creating more natural and synchronized facial animations for applications in virtual reality, gaming, and human-computer interaction, representing an incremental improvement over prior deep learning approaches.

The paper tackled the problem of generating realistic talking face videos from speech by addressing the limitation of existing methods that only account for first-order feature interactions, proposing a polynomial fusion layer to model higher-order interactions and demonstrating improved performance on video quality, audiovisual synchronization, and blink generation metrics.

Speech-driven facial animation involves using a speech signal to generate realistic videos of talking faces. Recent deep learning approaches to facial synthesis rely on extracting low-dimensional representations and concatenating them, followed by a decoding step of the concatenated vector. This accounts for only first-order interactions of the features and ignores higher-order interactions. In this paper we propose a polynomial fusion layer that models the joint representation of the encodings by a higher-order polynomial, with the parameters modelled by a tensor decomposition. We demonstrate the suitability of this approach through experiments on generated videos evaluated on a range of metrics on video quality, audiovisual synchronisation and generation of blinks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes