Dance Style Transfer with Cross-modal Transformer
This addresses the problem of realistic dance motion synthesis for applications in animation and entertainment, but it is incremental as it builds on existing CycleGAN architecture.
The paper tackles dance style transfer by transforming motion clips between styles while preserving context, introducing CycleDance which extends CycleGAN with multimodal transformers and curriculum learning. Results show it significantly outperforms baseline CycleGAN in naturalness, transfer strength, and content preservation, as validated by a human study with 30 experienced participants.
We present CycleDance, a dance style transfer system to transform an existing motion clip in one dance style to a motion clip in another dance style while attempting to preserve motion context of the dance. Our method extends an existing CycleGAN architecture for modeling audio sequences and integrates multimodal transformer encoders to account for music context. We adopt sequence length-based curriculum learning to stabilize training. Our approach captures rich and long-term intra-relations between motion frames, which is a common challenge in motion transfer and synthesis work. We further introduce new metrics for gauging transfer strength and content preservation in the context of dance movements. We perform an extensive ablation study as well as a human study including 30 participants with 5 or more years of dance experience. The results demonstrate that CycleDance generates realistic movements with the target style, significantly outperforming the baseline CycleGAN on naturalness, transfer strength, and content preservation.