AINov 9, 2023

META4: Semantically-Aligned Generation of Metaphoric Gestures Using Self-Supervised Text and Speech Representation

arXiv:2311.05481v21 citationsh-index: 18
Originality Incremental advance
AI Analysis

This addresses the challenge of making virtual agents more expressive and semantically aligned in human-computer interaction, though it is incremental as it builds on existing gesture generation models by adding a new semantic component.

The paper tackles the problem of generating metaphoric gestures for virtual agents by incorporating Image Schemas to capture semantic meaning, resulting in a method that effectively models gestures driven by both speech and these schemas.

Image Schemas are repetitive cognitive patterns that influence the way we conceptualize and reason about various concepts present in speech. These patterns are deeply embedded within our cognitive processes and are reflected in our bodily expressions including gestures. Particularly, metaphoric gestures possess essential characteristics and semantic meanings that align with Image Schemas, to visually represent abstract concepts. The shape and form of gestures can convey abstract concepts, such as extending the forearm and hand or tracing a line with hand movements to visually represent the image schema of PATH. Previous behavior generation models have primarily focused on utilizing speech (acoustic features and text) to drive the generation model of virtual agents. They have not considered key semantic information as those carried by Image Schemas to effectively generate metaphoric gestures. To address this limitation, we introduce META4, a deep learning approach that generates metaphoric gestures from both speech and Image Schemas. Our approach has two primary goals: computing Image Schemas from input text to capture the underlying semantic and metaphorical meaning, and generating metaphoric gestures driven by speech and the computed image schemas. Our approach is the first method for generating speech driven metaphoric gestures while leveraging the potential of Image Schemas. We demonstrate the effectiveness of our approach and highlight the importance of both speech and image schemas in modeling metaphoric gestures.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes