How Animals Dance (When You're Not Looking)
This addresses the niche problem of creating entertaining animal dance content for media or social media applications, but it is incremental as it builds on existing text-to-image and video diffusion methods.
The paper tackles the problem of generating music-synchronized animal dance videos by developing a keyframe-based framework that formulates dance synthesis as a graph optimization problem, producing up to 30-second videos from as few as six input keyframes across various animals and music tracks.
We present a keyframe-based framework for generating music-synchronized, choreography aware animal dance videos. Starting from a few keyframes representing distinct animal poses -- generated via text-to-image prompting or GPT-4o -- we formulate dance synthesis as a graph optimization problem: find the optimal keyframe structure that satisfies a specified choreography pattern of beats, which can be automatically estimated from a reference dance video. We also introduce an approach for mirrored pose image generation, essential for capturing symmetry in dance. In-between frames are synthesized using an video diffusion model. With as few as six input keyframes, our method can produce up to 30 second dance videos across a wide range of animals and music tracks.