SDAILGMMASMar 14, 2024

LM2D: Lyrics- and Music-Driven Dance Synthesis

arXiv:2403.09407v16 citations
Originality Incremental advance
AI Analysis

This addresses a gap in dance synthesis for applications like entertainment or virtual performances, though it is incremental by adding lyrics to existing audio-based methods.

The paper tackles the problem of synthesizing dance movements conditioned on both music and lyrics, rather than just audio, and demonstrates that LM2D produces realistic and diverse dance matching both inputs.

Dance typically involves professional choreography with complex movements that follow a musical rhythm and can also be influenced by lyrical content. The integration of lyrics in addition to the auditory dimension, enriches the foundational tone and makes motion generation more amenable to its semantic meanings. However, existing dance synthesis methods tend to model motions only conditioned on audio signals. In this work, we make two contributions to bridge this gap. First, we propose LM2D, a novel probabilistic architecture that incorporates a multimodal diffusion model with consistency distillation, designed to create dance conditioned on both music and lyrics in one diffusion generation step. Second, we introduce the first 3D dance-motion dataset that encompasses both music and lyrics, obtained with pose estimation technologies. We evaluate our model against music-only baseline models with objective metrics and human evaluations, including dancers and choreographers. The results demonstrate LM2D is able to produce realistic and diverse dance matching both lyrics and music. A video summary can be accessed at: https://youtu.be/4XCgvYookvA.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes