SDGRLGASJun 25, 2021

Transflower: probabilistic autoregressive dance generation with multimodal attention

arXiv:2106.13871v351 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of generating realistic and diverse dance sequences from music for applications in entertainment and animation, representing an incremental improvement with a novel method for a known bottleneck.

The authors tackled the problem of generating dance movements conditioned on music by introducing a probabilistic autoregressive model with multimodal attention and a large 3D dance-motion dataset. They showed that modeling probability distributions and attending to large motion and music contexts are necessary to produce interesting, diverse, and realistic dance that matches the music, as validated by objective metrics and a user study.

Dance requires skillful composition of complex movements that follow rhythmic, tonal and timbral features of music. Formally, generating dance conditioned on a piece of music can be expressed as a problem of modelling a high-dimensional continuous motion signal, conditioned on an audio signal. In this work we make two contributions to tackle this problem. First, we present a novel probabilistic autoregressive architecture that models the distribution over future poses with a normalizing flow conditioned on previous poses as well as music context, using a multimodal transformer encoder. Second, we introduce the currently largest 3D dance-motion dataset, obtained with a variety of motion-capture technologies, and including both professional and casual dancers. Using this dataset, we compare our new model against two baselines, via objective metrics and a user study, and show that both the ability to model a probability distribution, as well as being able to attend over a large motion and music context are necessary to produce interesting, diverse, and realistic dance that matches the music.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes