CVAIHCASIVOct 28, 2020

Generative Adversarial Networks in Human Emotion Synthesis:A Review

arXiv:2010.15075v230 citations
Originality Synthesis-oriented
AI Analysis

This is an incremental review paper that synthesizes existing knowledge for researchers in affective computing and generative modeling.

This paper reviews recent advances in human emotion synthesis using generative models, focusing on audio and video modalities, including facial expression synthesis, speech emotion synthesis, and cross-modal applications, while identifying open research problems for future work.

Synthesizing realistic data samples is of great value for both academic and industrial communities. Deep generative models have become an emerging topic in various research areas like computer vision and signal processing. Affective computing, a topic of a broad interest in computer vision society, has been no exception and has benefited from generative models. In fact, affective computing observed a rapid derivation of generative models during the last two decades. Applications of such models include but are not limited to emotion recognition and classification, unimodal emotion synthesis, and cross-modal emotion synthesis. As a result, we conducted a review of recent advances in human emotion synthesis by studying available databases, advantages, and disadvantages of the generative models along with the related training strategies considering two principal human communication modalities, namely audio and video. In this context, facial expression synthesis, speech emotion synthesis, and the audio-visual (cross-modal) emotion synthesis is reviewed extensively under different application scenarios. Gradually, we discuss open research problems to push the boundaries of this research area for future works.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes