CVOct 23, 2025

HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives

Yihao Meng, Hao Ouyang, Yue Yu, Qiuyu Wang, Wen Wang, Ka Leong Cheng, Hanlin Wang, Yixuan Li, Cheng Chen, Yanhong Zeng, Yujun Shen, Huamin Qu

arXiv:2510.20822v127.632 citationsh-index: 10Has Code

Originality Highly original

AI Analysis

This work addresses the challenge of automated filmmaking for content creators, moving beyond clip synthesis to holistic scene generation.

The paper tackles the problem of generating coherent multi-shot long video narratives from text, bridging the 'narrative gap' in text-to-video models, and achieves a new state-of-the-art in narrative coherence with emergent abilities like persistent memory and cinematic understanding.

State-of-the-art text-to-video models excel at generating isolated clips but fall short of creating the coherent, multi-shot narratives, which are the essence of storytelling. We bridge this "narrative gap" with HoloCine, a model that generates entire scenes holistically to ensure global consistency from the first shot to the last. Our architecture achieves precise directorial control through a Window Cross-Attention mechanism that localizes text prompts to specific shots, while a Sparse Inter-Shot Self-Attention pattern (dense within shots but sparse between them) ensures the efficiency required for minute-scale generation. Beyond setting a new state-of-the-art in narrative coherence, HoloCine develops remarkable emergent abilities: a persistent memory for characters and scenes, and an intuitive grasp of cinematic techniques. Our work marks a pivotal shift from clip synthesis towards automated filmmaking, making end-to-end cinematic creation a tangible future. Our code is available at: https://holo-cine.github.io/.

View on arXiv PDF

Similar