SMooDi: Stylized Motion Diffusion Model
This work addresses the need for rapid generation of diverse stylized motions in fields like animation or gaming, though it appears incremental as it builds on pre-trained text-to-motion models.
The paper tackles the problem of generating stylized motion from content texts and style motion sequences by introducing SMooDi, a Stylized Motion Diffusion model, which outperforms existing methods in experiments across various applications.
We introduce a novel Stylized Motion Diffusion model, dubbed SMooDi, to generate stylized motion driven by content texts and style motion sequences. Unlike existing methods that either generate motion of various content or transfer style from one sequence to another, SMooDi can rapidly generate motion across a broad range of content and diverse styles. To this end, we tailor a pre-trained text-to-motion model for stylization. Specifically, we propose style guidance to ensure that the generated motion closely matches the reference style, alongside a lightweight style adaptor that directs the motion towards the desired style while ensuring realism. Experiments across various applications demonstrate that our proposed framework outperforms existing methods in stylized motion generation.