CVDec 8, 2025

MultiMotion: Multi Subject Video Motion Transfer via Video Diffusion Transformer

Penghui Liu, Jiangshan Wang, Yutong Shen, Shanhui Mo, Chenyang Qi, Yue Ma

arXiv:2512.07500v12 citationsh-index: 2

Originality Highly original

AI Analysis

This work addresses the challenge of controlling motion for multiple objects in video generation, which is incremental as it builds upon existing DiT architectures with novel components.

The paper tackles the problem of multi-object video motion transfer using Diffusion Transformers by addressing motion entanglement and lack of object-level control, resulting in precise, semantically aligned, and temporally coherent motion transfer for multiple objects while maintaining high quality and scalability.

Multi-object video motion transfer poses significant challenges for Diffusion Transformer (DiT) architectures due to inherent motion entanglement and lack of object-level control. We present MultiMotion, a novel unified framework that overcomes these limitations. Our core innovation is Maskaware Attention Motion Flow (AMF), which utilizes SAM2 masks to explicitly disentangle and control motion features for multiple objects within the DiT pipeline. Furthermore, we introduce RectPC, a high-order predictor-corrector solver for efficient and accurate sampling, particularly beneficial for multi-entity generation. To facilitate rigorous evaluation, we construct the first benchmark dataset specifically for DiT-based multi-object motion transfer. MultiMotion demonstrably achieves precise, semantically aligned, and temporally coherent motion transfer for multiple distinct objects, maintaining DiT's high quality and scalability. The code is in the supp.

View on arXiv PDF

Similar