CVApr 23, 2024

You Think, You ACT: The New Task of Arbitrary Text to Motion Generation

arXiv:2404.14745v67 citationsh-index: 3
AI Analysis

This work addresses the need for more flexible and practical text-to-motion generation in complex industries, though it is incremental as it builds on existing datasets and methods.

The paper tackles the problem of generating human motions from arbitrary text inputs, not just limited action labels, by creating a new dataset HUMANMLD++ and proposing a framework that extracts action instructions from text. The result shows that this realistic setting is challenging, fostering new research in practical applications like virtual human interaction and robot behavior generation.

Text to Motion aims to generate human motions from texts. Existing settings rely on limited Action Texts that include action labels, which limits flexibility and practicability in scenarios difficult to describe directly. This paper extends limited Action Texts to arbitrary ones. Scene texts without explicit action labels can enhance the practicality of models in complex and diverse industries such as virtual human interaction, robot behavior generation, and film production, while also supporting the exploration of potential implicit behavior patterns. However, newly introduced Scene Texts may yield multiple reasonable output results, causing significant challenges in existing data, framework, and evaluation. To address this practical issue, we first create a new dataset HUMANML3D++ by extending texts of the largest existing dataset HUMANML3D. Secondly, we propose a simple yet effective framework that extracts action instructions from arbitrary texts and subsequently generates motions. Furthermore, we also benchmark this new setting with multi-solution metrics to address the inadequacies of existing single-solution metrics. Extensive experiments indicate that Text to Motion in this realistic setting is challenging, fostering new research in this practical direction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes