CLAIHCROMar 26, 2025

SARGes: Semantically Aligned Reliable Gesture Generation via Intent Chain

arXiv:2503.20202v12 citationsh-index: 20Proceedings of the International Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents
Originality Incremental advance
AI Analysis

This addresses the challenge of realistic human-computer interaction by improving gesture semantics, though it is incremental as it builds on existing gesture generation methods.

The paper tackled the problem of generating semantically meaningful co-speech gestures by proposing SARGes, a framework that uses large language models to parse speech and generate gesture labels, achieving 50.2% accuracy in semantic alignment with 0.4-second inference time.

Co-speech gesture generation enhances human-computer interaction realism through speech-synchronized gesture synthesis. However, generating semantically meaningful gestures remains a challenging problem. We propose SARGes, a novel framework that leverages large language models (LLMs) to parse speech content and generate reliable semantic gesture labels, which subsequently guide the synthesis of meaningful co-speech gestures.First, we constructed a comprehensive co-speech gesture ethogram and developed an LLM-based intent chain reasoning mechanism that systematically parses and decomposes gesture semantics into structured inference steps following ethogram criteria, effectively guiding LLMs to generate context-aware gesture labels. Subsequently, we constructed an intent chain-annotated text-to-gesture label dataset and trained a lightweight gesture label generation model, which then guides the generation of credible and semantically coherent co-speech gestures. Experimental results demonstrate that SARGes achieves highly semantically-aligned gesture labeling (50.2% accuracy) with efficient single-pass inference (0.4 seconds). The proposed method provides an interpretable intent reasoning pathway for semantic gesture synthesis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes