CV AIMay 19, 2025

MAGI-1: Autoregressive Video Generation at Scale

Sand. ai, Hansi Teng, Hongyu Jia, Lei Sun, Lingzhi Li, Maolin Li, Mingqiu Tang, Shuai Han, Tianning Zhang, W. Q. Zhang, Weifeng Luo, Xiaoyang Kang

arXiv:2505.13211v148.4235 citationsh-index: 9Has Code

Originality Incremental advance

AI Analysis

This work addresses video generation for AI applications, offering controllable and scalable solutions, but it appears incremental as it builds on existing autoregressive and denoising methods.

The researchers tackled video generation by developing MAGI-1, an autoregressive world model that predicts video chunks with increasing noise over time, achieving strong performance on image-to-video tasks with high temporal consistency and scalability, as demonstrated by a 24-billion-parameter variant supporting up to 4 million tokens.

We present MAGI-1, a world model that generates videos by autoregressively predicting a sequence of video chunks, defined as fixed-length segments of consecutive frames. Trained to denoise per-chunk noise that increases monotonically over time, MAGI-1 enables causal temporal modeling and naturally supports streaming generation. It achieves strong performance on image-to-video (I2V) tasks conditioned on text instructions, providing high temporal consistency and scalability, which are made possible by several algorithmic innovations and a dedicated infrastructure stack. MAGI-1 facilitates controllable generation via chunk-wise prompting and supports real-time, memory-efficient deployment by maintaining constant peak inference cost, regardless of video length. The largest variant of MAGI-1 comprises 24 billion parameters and supports context lengths of up to 4 million tokens, demonstrating the scalability and robustness of our approach. The code and models are available at https://github.com/SandAI-org/MAGI-1 and https://github.com/SandAI-org/MagiAttention. The product can be accessed at https://sand.ai.

View on arXiv PDF Code

Similar