CVOct 17, 2024

Emphasizing Semantic Consistency of Salient Posture for Speech-Driven Gesture Generation

arXiv:2410.13786v12 citationsh-index: 10MM
Originality Incremental advance
AI Analysis

This work improves speech-driven gesture generation for applications like virtual avatars or human-computer interaction, but it is incremental as it builds on existing neural network methods with specific enhancements.

The paper tackled the problem of generating gestures from speech by addressing the lack of semantic association and handling of salient gestures, resulting in a method that outperforms state-of-the-art approaches in experiments.

Speech-driven gesture generation aims at synthesizing a gesture sequence synchronized with the input speech signal. Previous methods leverage neural networks to directly map a compact audio representation to the gesture sequence, ignoring the semantic association of different modalities and failing to deal with salient gestures. In this paper, we propose a novel speech-driven gesture generation method by emphasizing the semantic consistency of salient posture. Specifically, we first learn a joint manifold space for the individual representation of audio and body pose to exploit the inherent semantic association between two modalities, and propose to enforce semantic consistency via a consistency loss. Furthermore, we emphasize the semantic consistency of salient postures by introducing a weakly-supervised detector to identify salient postures, and reweighting the consistency loss to focus more on learning the correspondence between salient postures and the high-level semantics of speech content. In addition, we propose to extract audio features dedicated to facial expression and body gesture separately, and design separate branches for face and body gesture synthesis. Extensive experimental results demonstrate the superiority of our method over the state-of-the-art approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes