CVDec 27, 2021

Responsive Listening Head Generation: A Benchmark Dataset and Baseline

arXiv:2112.13548v382 citations
Originality Incremental advance
AI Analysis

This work addresses the need for realistic non-verbal feedback in digital humans, virtual agents, and social robots, but it is incremental as it builds on existing talking head generation by focusing on the understudied listening aspect.

The authors tackled the problem of generating responsive listening head behaviors (e.g., nods, smiles) during face-to-face conversations by introducing the ViCo dataset with 92 identities and 483 clips, and they released a baseline model that conditions on different listening attitudes.

We present a new listening head generation benchmark, for synthesizing responsive feedbacks of a listener (e.g., nod, smile) during a face-to-face conversation. As the indispensable complement to talking heads generation, listening head generation has seldomly been studied in literature. Automatically synthesizing listening behavior that actively responds to a talking head, is critical to applications such as digital human, virtual agents and social robots. In this work, we propose a novel dataset "ViCo", highlighting the listening head generation during a face-to-face conversation. A total number of 92 identities (67 speakers and 76 listeners) are involved in ViCo, featuring 483 clips in a paired "speaking-listening" pattern, where listeners show three listening styles based on their attitudes: positive, neutral, negative. Different from traditional speech-to-gesture or talking-head generation, listening head generation takes as input both the audio and visual signals from the speaker, and gives non-verbal feedbacks (e.g., head motions, facial expressions) in a real-time manner. Our dataset supports a wide range of applications such as human-to-human interaction, video-to-video translation, cross-modal understanding and generation. To encourage further research, we also release a listening head generation baseline, conditioning on different listening attitudes. Code & ViCo dataset: https://project.mhzhou.com/vico.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes