CV GRNov 13, 2023

Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text

Zhongfei Qing, Zhongang Cai, Zhitao Yang, Lei Yang

arXiv:2311.07446v113.123 citationsh-index: 20

Originality Highly original

AI Analysis

This work addresses a challenging problem for animation, gaming, and film industries by providing a comprehensive solution for text-driven character animation, which is novel and not incremental.

The paper tackles the problem of generating natural human motion from long text descriptions, a task called Story-to-Motion, by proposing a system that synthesizes controllable, infinitely long motions and trajectories aligned with input text, outperforming previous state-of-the-art methods across trajectory following, temporal action composition, and motion blending sub-tasks.

Generating natural human motion from a story has the potential to transform the landscape of animation, gaming, and film industries. A new and challenging task, Story-to-Motion, arises when characters are required to move to various locations and perform specific motions based on a long text description. This task demands a fusion of low-level control (trajectories) and high-level control (motion semantics). Previous works in character control and text-to-motion have addressed related aspects, yet a comprehensive solution remains elusive: character control methods do not handle text description, whereas text-to-motion methods lack position constraints and often produce unstable motions. In light of these limitations, we propose a novel system that generates controllable, infinitely long motions and trajectories aligned with the input text. (1) We leverage contemporary Large Language Models to act as a text-driven motion scheduler to extract a series of (text, position, duration) pairs from long text. (2) We develop a text-driven motion retrieval scheme that incorporates motion matching with motion semantic and trajectory constraints. (3) We design a progressive mask transformer that addresses common artifacts in the transition motion such as unnatural pose and foot sliding. Beyond its pioneering role as the first comprehensive solution for Story-to-Motion, our system undergoes evaluation across three distinct sub-tasks: trajectory following, temporal action composition, and motion blending, where it outperforms previous state-of-the-art motion synthesis methods across the board. Homepage: https://story2motion.github.io/.

View on arXiv PDF

Similar