AICVSDMar 28

TokenDance: Token-to-Token Music-to-Dance Generation with Bidirectional Mamba

arXiv:2603.2731470.05 citationsh-index: 4
AI Analysis

For virtual reality and digital character animation, TokenDance improves dance generation quality and speed over existing methods, but the improvement is incremental as it builds on known tokenization and Mamba architectures.

TokenDance addresses poor generalization of music-to-dance models to real-world music by proposing a two-stage framework with dual-modality tokenization and a Bidirectional Mamba generator, achieving SOTA generation quality and inference speed.

Music-to-dance generation has broad applications in virtual reality, dance education, and digital character animation. However, the limited coverage of existing 3D dance datasets confines current models to a narrow subset of music styles and choreographic patterns, resulting in poor generalization to real-world music. Consequently, generated dances often become overly simplistic and repetitive, substantially degrading expressiveness and realism. To tackle this problem, we present TokenDance, a two-stage music-to-dance generation framework that explicitly addresses this limitation through dual-modality tokenization and efficient token-level generation. In the first stage, we discretize both dance and music using Finite Scalar Quantization, where dance motions are factorized into upper and lower-body components with kinematic-dynamic constraints, and music is decomposed into semantic and acoustic features with dedicated codebooks to capture choreography-specific structures. In the second stage, we introduce a Local-Global-Local token-to-token generator built on a Bidirectional Mamba backbone, enabling coherent motion synthesis, strong music-dance alignment, and efficient non-autoregressive inference. Extensive experiments demonstrate that TokenDance achieves overall state-of-the-art (SOTA) performance in both generation quality and inference speed, highlighting its effectiveness and practical value for real-world music-to-dance applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes