CV MMJul 19, 2023

Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline

Zhigang Chang, Weitai Hu, Qing Yang, Shibao Zheng

arXiv:2307.09821v16.811 citationsh-index: 68

Originality Incremental advance

AI Analysis

This work addresses the challenge of synthesizing non-verbal semantic expressions in dyadic interactions, but it is incremental as it builds upon the ViCo baseline with enhancements.

The paper tackled the problem of generating responsive listener head videos from speaker audio and listener reference images, achieving first place on the official leaderboard for the listening head generation track.

In dyadic speaker-listener interactions, the listener's head reactions along with the speaker's head movements, constitute an important non-verbal semantic expression together. The listener Head generation task aims to synthesize responsive listener's head videos based on audios of the speaker and reference images of the listener. Compared to the Talking-head generation, it is more challenging to capture the correlation clues from the speaker's audio and visual information. Following the ViCo baseline scheme, we propose a high-performance solution by enhancing the hierarchical semantic extraction capability of the audio encoder module and improving the decoder part, renderer and post-processing modules. Our solution gets the first place on the official leaderboard for the track of listening head generation. This paper is a technical report of ViCo@2023 Conversational Head Generation Challenge in ACM Multimedia 2023 conference.

View on arXiv PDF

Similar