CVDec 20, 2023

Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis

arXiv:2312.13834v143 citationsh-index: 33CVPR
Originality Incremental advance
AI Analysis

This addresses the need for fast and high-quality video editing tools, though it is incremental as it builds on existing diffusion models.

The paper tackles the problem of video-to-video synthesis by adapting image-editing diffusion models for video editing, achieving a 44x speed improvement and generating 120-frame videos in 14 seconds with superior quality confirmed by a user study.

In this paper, we introduce Fairy, a minimalist yet robust adaptation of image-editing diffusion models, enhancing them for video editing applications. Our approach centers on the concept of anchor-based cross-frame attention, a mechanism that implicitly propagates diffusion features across frames, ensuring superior temporal coherence and high-fidelity synthesis. Fairy not only addresses limitations of previous models, including memory and processing speed. It also improves temporal consistency through a unique data augmentation strategy. This strategy renders the model equivariant to affine transformations in both source and target images. Remarkably efficient, Fairy generates 120-frame 512x384 videos (4-second duration at 30 FPS) in just 14 seconds, outpacing prior works by at least 44x. A comprehensive user study, involving 1000 generated samples, confirms that our approach delivers superior quality, decisively outperforming established methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes