ASCVSDDec 19, 2017

Audio to Body Dynamics

arXiv:1712.09382v1169 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of creating realistic avatar animations for musicians from audio, which is a novel application in computer vision and audio processing.

The paper tackles the problem of predicting natural body dynamics from audio alone, specifically for piano and violin playing, and demonstrates that it is possible to generate skeleton-based animations of an avatar's hands and arms from music input.

We present a method that gets as input an audio of violin or piano playing, and outputs a video of skeleton predictions which are further used to animate an avatar. The key idea is to create an animation of an avatar that moves their hands similarly to how a pianist or violinist would do, just from audio. Aiming for a fully detailed correct arms and fingers motion is a goal, however, it's not clear if body movement can be predicted from music at all. In this paper, we present the first result that shows that natural body dynamics can be predicted at all. We built an LSTM network that is trained on violin and piano recital videos uploaded to the Internet. The predicted points are applied onto a rigged avatar to create the animation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes