CVAug 28, 2023

MagicEdit: High-Fidelity and Temporally Coherent Video Editing

arXiv:2308.14749v177 citationsh-index: 46
Originality Incremental advance
AI Analysis

This addresses the challenge of maintaining per-frame quality and temporal consistency in video editing for applications like stylization and outpainting, though it appears incremental as it builds on existing video editing methods.

The authors tackled the problem of text-guided video editing by proposing MagicEdit, which explicitly disentangles content, structure, and motion signals during training, resulting in high-fidelity and temporally coherent video-to-video translation.

In this report, we present MagicEdit, a surprisingly simple yet effective solution to the text-guided video editing task. We found that high-fidelity and temporally coherent video-to-video translation can be achieved by explicitly disentangling the learning of content, structure and motion signals during training. This is in contradict to most existing methods which attempt to jointly model both the appearance and temporal representation within a single framework, which we argue, would lead to degradation in per-frame quality. Despite its simplicity, we show that MagicEdit supports various downstream video editing tasks, including video stylization, local editing, video-MagicMix and video outpainting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes