CVAIDec 6, 2024

MotionFlow: Attention-Driven Motion Transfer in Video Diffusion Models

arXiv:2412.05275v113 citationsh-index: 11
Originality Incremental advance
AI Analysis

This addresses the limitation of motion control in video diffusion models for practical applications, representing a novel method for a known bottleneck.

The paper tackles the problem of fine-grained motion control in text-to-video models by introducing MotionFlow, a framework that uses cross-attention maps for motion transfer without training, resulting in significant outperformance in fidelity and versatility over existing models.

Text-to-video models have demonstrated impressive capabilities in producing diverse and captivating video content, showcasing a notable advancement in generative AI. However, these models generally lack fine-grained control over motion patterns, limiting their practical applicability. We introduce MotionFlow, a novel framework designed for motion transfer in video diffusion models. Our method utilizes cross-attention maps to accurately capture and manipulate spatial and temporal dynamics, enabling seamless motion transfers across various contexts. Our approach does not require training and works on test-time by leveraging the inherent capabilities of pre-trained video diffusion models. In contrast to traditional approaches, which struggle with comprehensive scene changes while maintaining consistent motion, MotionFlow successfully handles such complex transformations through its attention-based mechanism. Our qualitative and quantitative experiments demonstrate that MotionFlow significantly outperforms existing models in both fidelity and versatility even during drastic scene alterations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes