CVAIMar 24, 2025

Resource-Efficient Motion Control for Video Generation via Dynamic Mask Guidance

arXiv:2503.18386v11 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses resource efficiency and control issues in video generation for content creation applications, representing an incremental improvement over existing methods.

The paper tackles the challenges of high training costs, data requirements, and text-motion inconsistency in text-to-video generation by proposing a mask-guided method that uses mask motion sequences for control, achieving improved consistency and quality in tasks like video editing and artistic video generation.

Recent advances in diffusion models bring new vitality to visual content creation. However, current text-to-video generation models still face significant challenges such as high training costs, substantial data requirements, and difficulties in maintaining consistency between given text and motion of the foreground object. To address these challenges, we propose mask-guided video generation, which can control video generation through mask motion sequences, while requiring limited training data. Our model enhances existing architectures by incorporating foreground masks for precise text-position matching and motion trajectory control. Through mask motion sequences, we guide the video generation process to maintain consistent foreground objects throughout the sequence. Additionally, through a first-frame sharing strategy and autoregressive extension approach, we achieve more stable and longer video generation. Extensive qualitative and quantitative experiments demonstrate that this approach excels in various video generation tasks, such as video editing and generating artistic videos, outperforming previous methods in terms of consistency and quality. Our generated results can be viewed in the supplementary materials.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes