CV AIJan 9, 2024

MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation

Weimin Wang, Jiawei Liu, Zhijie Lin, Jiangqiao Yan, Shuo Chen, Chetwin Low, Tuyen Hoang, Jie Wu, Jun Hao Liew, Hanshu Yan, Daquan Zhou, Jiashi Feng

arXiv:2401.04468v126.561 citationsh-index: 23

Originality Incremental advance

AI Analysis

This work addresses the growing demand for high-quality video generation from text, which is important for content creators and AI applications, though it appears incremental as it builds on existing components in a novel pipeline.

The paper tackles the problem of generating high-fidelity videos from text descriptions by introducing MagicVideo-V2, an end-to-end pipeline that integrates multiple modules, resulting in videos with superior aesthetic quality and smoothness compared to leading systems like Runway and Stable Video Diffusion, as validated by large-scale user evaluations.

The growing demand for high-fidelity video generation from textual descriptions has catalyzed significant research in this field. In this work, we introduce MagicVideo-V2 that integrates the text-to-image model, video motion generator, reference image embedding module and frame interpolation module into an end-to-end video generation pipeline. Benefiting from these architecture designs, MagicVideo-V2 can generate an aesthetically pleasing, high-resolution video with remarkable fidelity and smoothness. It demonstrates superior performance over leading Text-to-Video systems such as Runway, Pika 1.0, Morph, Moon Valley and Stable Video Diffusion model via user evaluation at large scale.

View on arXiv PDF

Similar