Not Like Transformers: Drop the Beat Representation for Dance Generation with Mamba-Based Diffusion Model
This work provides a novel approach for generating realistic and music-synchronized dance movements, which is beneficial for content creators and virtual reality applications, offering an incremental improvement over existing methods.
The paper introduces MambaDance, a Mamba-based diffusion model for dance generation that addresses the limitations of existing methods in capturing sequential, rhythmical, and music-synchronized dance characteristics. It integrates Mamba into a two-stage diffusion architecture and proposes a Gaussian-based beat representation to guide dance sequence decoding, resulting in plausible and characteristic dance movements across various sequence lengths on AIST++ and FineDance datasets.
Dance is a form of human motion characterized by emotional expression and communication, playing a role in various fields such as music, virtual reality, and content creation. Existing methods for dance generation often fail to adequately capture the inherently sequential, rhythmical, and music-synchronized characteristics of dance. In this paper, we propose \emph{MambaDance}, a new dance generation approach that leverages a Mamba-based diffusion model. Mamba, well-suited to handling long and autoregressive sequences, is integrated into our two-stage diffusion architecture, substituting off-the-shelf Transformer. Additionally, considering the critical role of musical beats in dance choreography, we propose a Gaussian-based beat representation to explicitly guide the decoding of dance sequences. Experiments on AIST++ and FineDance datasets for each sequence length show that our proposed method effectively generates plausible dance movements while reflecting essential characteristics, consistently from short to long dances, compared to the previous methods. Additional qualitative results and demo videos are available at \small{https://vision3d-lab.github.io/mambadance}.