CVMar 1

Let Your Image Move with Your Motion! -- Implicit Multi-Object Multi-Motion Transfer

arXiv:2603.01000v11 citationsh-index: 5
Originality Highly original
AI Analysis

This addresses the challenge of controllable video generation for scenarios with multiple objects requiring independent motions, representing a significant advance over single-object methods.

The paper tackles the problem of transferring multiple distinct motion patterns to different objects in a static image for video generation, presenting FlexiMMT, which achieves state-of-the-art performance in multi-object, multi-motion transfer.

Motion transfer has emerged as a promising direction for controllable video generation, yet existing methods largely focus on single-object scenarios and struggle when multiple objects require distinct motion patterns. In this work, we present FlexiMMT, the first implicit image-to-video (I2V) motion transfer framework that explicitly enables multi-object, multi-motion transfer. Given a static multi-object image and multiple reference videos, FlexiMMT independently extracts motion representations and accurately assigns them to different objects, supporting flexible recombination and arbitrary motion-to-object mappings. To address the core challenge of cross-object motion entanglement, we introduce a Motion Decoupled Mask Attention Mechanism that uses object-specific masks to constrain attention, ensuring that motion and text tokens only influence their designated regions. We further propose a Differentiated Mask Propagation Mechanism that derives object-specific masks directly from diffusion attention and progressively propagates them across frames efficiently. Extensive experiments demonstrate that FlexiMMT achieves precise, compositional, and state-of-the-art performance in I2V-based multi-object multi-motion transfer.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes