ASAIMMSDJun 15, 2023

Taming Diffusion Models for Music-driven Conducting Motion Generation

arXiv:2306.10065v215 citationsh-index: 13
Originality Incremental advance
AI Analysis

This work addresses the problem of music-driven motion generation for conducting, which is incremental as it applies diffusion models to a domain previously dominated by GANs.

The paper tackles generating orchestral conductor motions from symphony music by proposing Diffusion-Conductor, a DDIM-based diffusion model integrated into a two-stage framework, which outperforms prior GAN-based methods with improved training stability and output quality, as evidenced by novel metrics like Frechet Gesture Distance and Beat Consistency Score.

Generating the motion of orchestral conductors from a given piece of symphony music is a challenging task since it requires a model to learn semantic music features and capture the underlying distribution of real conducting motion. Prior works have applied Generative Adversarial Networks (GAN) to this task, but the promising diffusion model, which recently showed its advantages in terms of both training stability and output quality, has not been exploited in this context. This paper presents Diffusion-Conductor, a novel DDIM-based approach for music-driven conducting motion generation, which integrates the diffusion model to a two-stage learning framework. We further propose a random masking strategy to improve the feature robustness, and use a pair of geometric loss functions to impose additional regularizations and increase motion diversity. We also design several novel metrics, including Frechet Gesture Distance (FGD) and Beat Consistency Score (BC) for a more comprehensive evaluation of the generated motion. Experimental results demonstrate the advantages of our model.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes