CVAug 20, 2019

A Neural Virtual Anchor Synthesizer based on Seq2Seq and GAN Models

arXiv:1908.07262v29 citations
Originality Synthesis-oriented
AI Analysis

This addresses the creation of virtual anchors for media or entertainment applications, but it is incremental as it builds on existing methods like Word2Vec, Seq2Seq, and Pix2PixHD.

The paper tackled the problem of generating realistic face videos of a virtual anchor reading news by developing a framework that combines Seq2Seq and GAN models, with experimental results showing feasibility for synthesis.

This paper presents a novel framework to generate realistic face video of an anchor, who is reading certain news. This task is also known as Virtual Anchor. Given some paragraphs of words, we first utilize a pretrained Word2Vec model to embed each word into a vector; then we utilize a Seq2Seq-based model to translate these word embeddings into action units and head poses of the target anchor; these action units and head poses will be concatenated with facial landmarks as well as the former $n$ synthesized frames, and the concatenation serves as input of a Pix2PixHD-based model to synthesize realistic facial images for the virtual anchor. The experimental results demonstrate our framework is feasible for the synthesis of virtual anchor.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes