CVSep 8, 2024

Leveraging WaveNet for Dynamic Listening Head Modeling from Speech

arXiv:2409.05089v1h-index: 8
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of simulating interactive communication feedback for applications like virtual avatars or video conferencing, but it is incremental as it builds on existing methods with a hybrid approach.

The paper tackled generating realistic listener head videos from speech using a sequence-to-sequence model combining WaveNet and LSTM, achieving results that surpass baseline models on the ViCo benchmark dataset.

The creation of listener facial responses aims to simulate interactive communication feedback from a listener during a face-to-face conversation. Our goal is to generate believable videos of listeners' heads that respond authentically to a single speaker by a sequence-to-sequence model with an combination of WaveNet and Long short-term memory network. Our approach focuses on capturing the subtle nuances of listener feedback, ensuring the preservation of individual listener identity while expressing appropriate attitudes and viewpoints. Experiment results show that our method surpasses the baseline models on ViCo benchmark Dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes