Incorporating Spatial Awareness in Data-Driven Gesture Generation for Virtual Agents
This work addresses the need for more natural human-agent communication by enabling embodied conversational agents to interact with their environment, though it is incremental as it builds on existing data-driven methods.
The paper tackled the problem of generating gestures for virtual agents without spatial context by integrating scene information into speech-driven gesture synthesis, resulting in a novel synthetic gesture dataset for this purpose.
This paper focuses on enhancing human-agent communication by integrating spatial context into virtual agents' non-verbal behaviors, specifically gestures. Recent advances in co-speech gesture generation have primarily utilized data-driven methods, which create natural motion but limit the scope of gestures to those performed in a void. Our work aims to extend these methods by enabling generative models to incorporate scene information into speech-driven gesture synthesis. We introduce a novel synthetic gesture dataset tailored for this purpose. This development represents a critical step toward creating embodied conversational agents that interact more naturally with their environment and users.