CVDec 13, 2024

MulSMo: Multimodal Stylized Motion Generation by Bidirectional Control Flow

arXiv:2412.09901v217 citationsh-index: 15
Originality Incremental advance
AI Analysis

This work addresses the challenge of integrating style and content in motion generation for applications in animation or robotics, representing an incremental advancement with multimodal control.

The paper tackles the problem of generating motion sequences that conform to a target style while adhering to content prompts by introducing a bidirectional control flow between style and content, which alleviates style-content conflicts and preserves style dynamics, and extends control to multiple modalities like text and images. It demonstrates significant performance improvements over previous methods across different datasets.

Generating motion sequences conforming to a target style while adhering to the given content prompts requires accommodating both the content and style. In existing methods, the information usually only flows from style to content, which may cause conflict between the style and content, harming the integration. Differently, in this work we build a bidirectional control flow between the style and the content, also adjusting the style towards the content, in which case the style-content collision is alleviated and the dynamics of the style is better preserved in the integration. Moreover, we extend the stylized motion generation from one modality, i.e. the style motion, to multiple modalities including texts and images through contrastive learning, leading to flexible style control on the motion generation. Extensive experiments demonstrate that our method significantly outperforms previous methods across different datasets, while also enabling multimodal signals control. The code of our method will be made publicly available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes