HCAICVGRLGMay 13, 2024

LLAniMAtion: LLAMA Driven Gesture Animation

arXiv:2405.08042v11 citationsh-index: 4Computer graphics forum (Print)
AI Analysis

This work addresses the challenge of creating realistic and engaging gestures for interactive agents, offering a novel text-based approach that could enhance animation in conversational settings.

The paper tackled the problem of generating co-speech gestures for character animation by using LLAMA2 features extracted from text instead of traditional audio-driven methods, finding that LLAMA2 features alone performed significantly better than audio features and combining both modalities did not improve results.

Co-speech gesturing is an important modality in conversation, providing context and social cues. In character animation, appropriate and synchronised gestures add realism, and can make interactive agents more engaging. Historically, methods for automatically generating gestures were predominantly audio-driven, exploiting the prosodic and speech-related content that is encoded in the audio signal. In this paper we instead experiment with using LLM features for gesture generation that are extracted from text using LLAMA2. We compare against audio features, and explore combining the two modalities in both objective tests and a user study. Surprisingly, our results show that LLAMA2 features on their own perform significantly better than audio features and that including both modalities yields no significant difference to using LLAMA2 features in isolation. We demonstrate that the LLAMA2 based model can generate both beat and semantic gestures without any audio input, suggesting LLMs can provide rich encodings that are well suited for gesture generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes