CVCLGRHCSep 30, 2024

MM-Conv: A Multi-modal Conversational Dataset for Virtual Humans

arXiv:2410.00253v13 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

This provides a comprehensive dataset for researchers in virtual humans and gesture generation, though it is incremental as it focuses on data collection rather than new methods.

The authors tackled the problem of co-speech gesture generation in 3D scenes by creating a novel multi-modal dataset captured in VR, which includes motion capture, speech, gaze, and scene graphs from conversations in a physics simulator.

In this paper, we present a novel dataset captured using a VR headset to record conversations between participants within a physics simulator (AI2-THOR). Our primary objective is to extend the field of co-speech gesture generation by incorporating rich contextual information within referential settings. Participants engaged in various conversational scenarios, all based on referential communication tasks. The dataset provides a rich set of multimodal recordings such as motion capture, speech, gaze, and scene graphs. This comprehensive dataset aims to enhance the understanding and development of gesture generation models in 3D scenes by providing diverse and contextually rich data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes