CVOct 31, 2023

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

arXiv:2310.20700v2229 citationsh-index: 41
Originality Incremental advance
AI Analysis

This work addresses the challenge of creating story-level long videos for applications in video generation and editing, representing an incremental advancement over existing short-clip methods.

The paper tackles the problem of generating coherent long videos with smooth transitions between scenes by introducing SEINE, a short-to-long video diffusion model that uses a random-mask approach and text-based control, achieving high-quality results validated through extensive experiments.

Recently video generation has achieved substantial progress with realistic results. Nevertheless, existing AI-generated videos are usually very short clips ("shot-level") depicting a single scene. To deliver a coherent long video ("story-level"), it is desirable to have creative transition and prediction effects across different clips. This paper presents a short-to-long video diffusion model, SEINE, that focuses on generative transition and prediction. The goal is to generate high-quality long videos with smooth and creative transitions between scenes and varying lengths of shot-level videos. Specifically, we propose a random-mask video diffusion model to automatically generate transitions based on textual descriptions. By providing the images of different scenes as inputs, combined with text-based control, our model generates transition videos that ensure coherence and visual quality. Furthermore, the model can be readily extended to various tasks such as image-to-video animation and autoregressive video prediction. To conduct a comprehensive evaluation of this new generative task, we propose three assessing criteria for smooth and creative transition: temporal consistency, semantic similarity, and video-text semantic alignment. Extensive experiments validate the effectiveness of our approach over existing methods for generative transition and prediction, enabling the creation of story-level long videos. Project page: https://vchitect.github.io/SEINE-project/ .

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes