CVMay 22, 2024

ReVideo: Remake a Video with Motion and Content Control

arXiv:2405.13865v172 citationsh-index: 26NIPS
Originality Incremental advance
AI Analysis

This addresses the problem of accurate video editing for users in creative fields, offering a novel approach that combines content and motion control, though it builds incrementally on existing diffusion model methods.

The paper tackles the challenge of precise and localized video editing by introducing ReVideo, which allows users to control both content and motion in specific areas, achieving promising performance in applications like changing content while preserving motion, customizing motion trajectories, and modifying both aspects simultaneously.

Despite significant advancements in video generation and editing using diffusion models, achieving accurate and localized video editing remains a substantial challenge. Additionally, most existing video editing methods primarily focus on altering visual content, with limited research dedicated to motion editing. In this paper, we present a novel attempt to Remake a Video (ReVideo) which stands out from existing methods by allowing precise video editing in specific areas through the specification of both content and motion. Content editing is facilitated by modifying the first frame, while the trajectory-based motion control offers an intuitive user interaction experience. ReVideo addresses a new task involving the coupling and training imbalance between content and motion control. To tackle this, we develop a three-stage training strategy that progressively decouples these two aspects from coarse to fine. Furthermore, we propose a spatiotemporal adaptive fusion module to integrate content and motion control across various sampling steps and spatial locations. Extensive experiments demonstrate that our ReVideo has promising performance on several accurate video editing applications, i.e., (1) locally changing video content while keeping the motion constant, (2) keeping content unchanged and customizing new motion trajectories, (3) modifying both content and motion trajectories. Our method can also seamlessly extend these applications to multi-area editing without specific training, demonstrating its flexibility and robustness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes