GRAIMMSep 29, 2023

Emotional Listener Portrait: Neural Listener Head Generation with Emotion

arXiv:2310.00068v220 citationsh-index: 13
Originality Incremental advance
AI Analysis

This work addresses the problem of generating realistic and controllable listener responses in human-computer interaction, but it is incremental as it builds on existing listener head generation methods.

The paper tackles the challenge of generating non-deterministic fine-grained facial expressions for listener head generation in conversations by proposing the Emotional Listener Portrait (ELP), which models facial motions as discrete codewords and their probability distributions under different emotions, resulting in significant improvements in quantitative metrics compared to previous methods.

Listener head generation centers on generating non-verbal behaviors (e.g., smile) of a listener in reference to the information delivered by a speaker. A significant challenge when generating such responses is the non-deterministic nature of fine-grained facial expressions during a conversation, which varies depending on the emotions and attitudes of both the speaker and the listener. To tackle this problem, we propose the Emotional Listener Portrait (ELP), which treats each fine-grained facial motion as a composition of several discrete motion-codewords and explicitly models the probability distribution of the motions under different emotion in conversation. Benefiting from the ``explicit'' and ``discrete'' design, our ELP model can not only automatically generate natural and diverse responses toward a given speaker via sampling from the learned distribution but also generate controllable responses with a predetermined attitude. Under several quantitative metrics, our ELP exhibits significant improvements compared to previous methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes