CVLGASJun 10, 2019

Learning Individual Styles of Conversational Gesture

arXiv:1906.04160v1392 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of creating personalized conversational gestures for applications like virtual avatars or human-computer interaction, though it appears incremental as it builds on existing cross-modal translation methods.

The paper tackles the problem of generating plausible hand and arm gestures from audio speech input, specifically for individual speakers using unlabeled videos with noisy pose data, and reports that the proposed model significantly outperforms baseline methods in quantitative comparisons.

Human speech is often accompanied by hand and arm gestures. Given audio speech input, we generate plausible gestures to go along with the sound. Specifically, we perform cross-modal translation from "in-the-wild'' monologue speech of a single speaker to their hand and arm motion. We train on unlabeled videos for which we only have noisy pseudo ground truth from an automatic pose detection system. Our proposed model significantly outperforms baseline methods in a quantitative comparison. To support research toward obtaining a computational understanding of the relationship between gesture and speech, we release a large video dataset of person-specific gestures. The project website with video, code and data can be found at http://people.eecs.berkeley.edu/~shiry/speech2gesture .

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes