CVMay 21

MotiMotion: Motion-Controlled Video Generation with Visual Reasoning

arXiv:2605.2281886.9
AI Analysis

For video generation researchers, MotiMotion addresses the problem of unnatural outcomes from sparse, imprecise motion control by incorporating causal reasoning, though the improvement is incremental over existing methods.

MotiMotion reformulates motion-controlled video generation as a reasoning-then-generation problem, using a training-free vision-language reasoner to refine trajectories and hallucinate secondary motions, achieving more plausible object behaviors and interactions preferred over existing approaches.

Current motion-controlled image-to-video generation models rigidly follow user-provided trajectories that are often sparse, imprecise, and causally incomplete. Such reliance often yields unnatural or implausible outcomes, especially by missing secondary causal consequences. To address this, we introduce MotiMotion, a novel framework that reformulates motion control as a reasoning-then-generation problem. To encourage causally grounded and commonsense-consistent interactions, we leverage a training-free vision-language reasoner to refine image-space coordinates of primary trajectories and to hallucinate plausible secondary motions. To further improve motion naturalness, we propose a confidence-aware control scheme that modulates guidance strength, enabling the model to closely follow high-confidence plans while correcting artifacts under low-confidence inputs with its internal generative priors. To support systematic evaluation, we curate a new image-to-video benchmark, MotiBench, consisting of interaction-centric scenes where new events are triggered by motion. Both VLM-based evaluation and a human study on MotiBench demonstrate that MotiMotion produces videos with more plausible object behaviors and interaction, and is preferred over existing approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes