LGAICLCVJul 25, 2025

Salsa as a Nonverbal Embodied Language -- The CoMPAS3D Dataset and Benchmarks

arXiv:2507.19684v1h-index: 3
Originality Incremental advance
AI Analysis

This addresses the problem of enabling humanoid AI to engage in safe and creative dance interactions with humans, though it is incremental as it builds on existing motion capture and AI methods for embodied interaction.

The paper tackles the challenge of modeling interactive, embodied communication in partner dance by introducing CoMPAS3D, the largest and most diverse motion capture dataset of improvised salsa dancing, which includes 3 hours of data from 18 dancers across skill levels and over 2,800 expert annotations, and presents benchmarks and a multitask model for tasks like leader-follower generation.

Imagine a humanoid that can safely and creatively dance with a human, adapting to its partner's proficiency, using haptic signaling as a primary form of communication. While today's AI systems excel at text or voice-based interaction with large language models, human communication extends far beyond text-it includes embodied movement, timing, and physical coordination. Modeling coupled interaction between two agents poses a formidable challenge: it is continuous, bidirectionally reactive, and shaped by individual variation. We present CoMPAS3D, the largest and most diverse motion capture dataset of improvised salsa dancing, designed as a challenging testbed for interactive, expressive humanoid AI. The dataset includes 3 hours of leader-follower salsa dances performed by 18 dancers spanning beginner, intermediate, and professional skill levels. For the first time, we provide fine-grained salsa expert annotations, covering over 2,800 move segments, including move types, combinations, execution errors and stylistic elements. We draw analogies between partner dance communication and natural language, evaluating CoMPAS3D on two benchmark tasks for synthetic humans that parallel key problems in spoken language and dialogue processing: leader or follower generation with proficiency levels (speaker or listener synthesis), and duet (conversation) generation. Towards a long-term goal of partner dance with humans, we release the dataset, annotations, and code, along with a multitask SalsaAgent model capable of performing all benchmark tasks, alongside additional baselines to encourage research in socially interactive embodied AI and creative, expressive humanoid motion generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes