CVMar 14, 2024

Video Editing via Factorized Diffusion Distillation

arXiv:2403.09334v243 citationsECCV
AI Analysis

This addresses the problem of high-quality video editing for AI and media applications, representing a novel method for a known bottleneck.

The paper tackles video editing without supervised data by introducing Emu Video Edit (EVE), which achieves state-of-the-art results through a novel unsupervised distillation method called Factorized Diffusion Distillation.

We introduce Emu Video Edit (EVE), a model that establishes a new state-of-the art in video editing without relying on any supervised video editing data. To develop EVE we separately train an image editing adapter and a video generation adapter, and attach both to the same text-to-image model. Then, to align the adapters towards video editing we introduce a new unsupervised distillation procedure, Factorized Diffusion Distillation. This procedure distills knowledge from one or more teachers simultaneously, without any supervised data. We utilize this procedure to teach EVE to edit videos by jointly distilling knowledge to (i) precisely edit each individual frame from the image editing adapter, and (ii) ensure temporal consistency among the edited frames using the video generation adapter. Finally, to demonstrate the potential of our approach in unlocking other capabilities, we align additional combinations of adapters

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes