FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis
This work addresses a problem in computer vision for applications requiring multi-person motion synthesis, representing a novel method for a known bottleneck rather than a foundational advancement.
The paper tackles the limitation of existing text-to-motion synthesis methods that are restricted to single or two-person scenarios by proposing FreeMotion, a unified framework that supports number-free motion generation for any number of individuals, achieving superior performance in experiments.
Text-to-motion synthesis is a crucial task in computer vision. Existing methods are limited in their universality, as they are tailored for single-person or two-person scenarios and can not be applied to generate motions for more individuals. To achieve the number-free motion synthesis, this paper reconsiders motion generation and proposes to unify the single and multi-person motion by the conditional motion distribution. Furthermore, a generation module and an interaction module are designed for our FreeMotion framework to decouple the process of conditional motion generation and finally support the number-free motion synthesis. Besides, based on our framework, the current single-person motion spatial control method could be seamlessly integrated, achieving precise control of multi-person motion. Extensive experiments demonstrate the superior performance of our method and our capability to infer single and multi-human motions simultaneously.