CVMar 10

Training-free Motion Factorization for Compositional Video Generation

Zixuan Wang, Ziqin Zhou, Feng Chen, Duo Peng, Yixin Hu, Changsheng Li, Yinjie Lei

arXiv:2603.09104v112.5h-index: 33

Predicted impact top 17% in CV · last 90 daysOriginality Incremental advance

AI Analysis

This addresses the challenge of compositional video generation for applications like animation or simulation, though it appears incremental as it builds on existing diffusion models.

The paper tackles the problem of generating videos with multiple instances having diverse appearance and motion by proposing a motion factorization framework that decomposes motion into three categories (motionlessness, rigid, and non-rigid) and uses a planning-before-generation approach to reduce semantic ambiguities, achieving impressive performance on real-world benchmarks.

Compositional video generation aims to synthesize multiple instances with diverse appearance and motion, which is widely applicable in real-world scenarios. However, current approaches mainly focus on binding semantics, neglecting to understand diverse motion categories specified in prompts. In this paper, we propose a motion factorization framework that decomposes complex motion into three primary categories: motionlessness, rigid motion, and non-rigid motion. Specifically, our framework follows a planning before generation paradigm. (1) During planning, we reason about motion laws on the motion graph to obtain frame-wise changes in the shape and position of each instance. This alleviates semantic ambiguities in the user prompt by organizing it into a structured representation of instances and their interactions. (2) During generation, we modulate the synthesis of distinct motion categories in a disentangled manner. Conditioned on the motion cues, guidance branches stabilize appearance in motionless regions, preserve rigid-body geometry, and regularize local non-rigid deformations. Crucially, our two modules are model-agnostic, which can be seamlessly incorporated into various diffusion model architectures. Extensive experiments demonstrate that our framework achieves impressive performance in motion synthesis on real-world benchmarks. Our code will be released soon.

View on arXiv PDF

Similar