HCCVLGOct 2, 2020

Understanding the Predictability of Gesture Parameters from Speech and their Perceptual Importance

arXiv:2010.00995v117 citations
Originality Incremental advance
AI Analysis

This work addresses the limited success of black-box gesture generators for embodied conversational agents by providing insights into speech-gesture relationships, though it is incremental as it builds on existing methods without introducing a new paradigm.

The study investigated how well specific gesture parameters (e.g., speed, size) can be predicted from speech using recurrent networks, finding partial predictability with some parameters like path length predicted more accurately than others like velocity, and assessed their perceptual importance through a study showing that degradation in any parameter was negatively viewed, with hand shape changes being particularly impactful.

Gesture behavior is a natural part of human conversation. Much work has focused on removing the need for tedious hand-animation to create embodied conversational agents by designing speech-driven gesture generators. However, these generators often work in a black-box manner, assuming a general relationship between input speech and output motion. As their success remains limited, we investigate in more detail how speech may relate to different aspects of gesture motion. We determine a number of parameters characterizing gesture, such as speed and gesture size, and explore their relationship to the speech signal in a two-fold manner. First, we train multiple recurrent networks to predict the gesture parameters from speech to understand how well gesture attributes can be modeled from speech alone. We find that gesture parameters can be partially predicted from speech, and some parameters, such as path length, being predicted more accurately than others, like velocity. Second, we design a perceptual study to assess the importance of each gesture parameter for producing motion that people perceive as appropriate for the speech. Results show that a degradation in any parameter was viewed negatively, but some changes, such as hand shape, are more impactful than others. A video summarization can be found at https://youtu.be/aw6-_5kmLjY.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes