CVDec 2, 2025

MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

arXiv:2512.03041v117 citationsh-index: 20
Originality Highly original
AI Analysis

This addresses the problem of producing coherent multi-shot videos for applications like filmmaking or storytelling, representing a novel method for a known bottleneck in video generation.

The paper tackles the challenge of generating narrative multi-shot videos with flexible shot arrangement and controllability, proposing MultiShotMaster which extends a pretrained model with novel RoPE variants and an automated data pipeline, achieving superior performance and controllability in experiments.

Current video generation techniques excel at single-shot clips but struggle to produce narrative multi-shot videos, which require flexible shot arrangement, coherent narrative, and controllability beyond text prompts. To tackle these challenges, we propose MultiShotMaster, a framework for highly controllable multi-shot video generation. We extend a pretrained single-shot model by integrating two novel variants of RoPE. First, we introduce Multi-Shot Narrative RoPE, which applies explicit phase shift at shot transitions, enabling flexible shot arrangement while preserving the temporal narrative order. Second, we design Spatiotemporal Position-Aware RoPE to incorporate reference tokens and grounding signals, enabling spatiotemporal-grounded reference injection. In addition, to overcome data scarcity, we establish an automated data annotation pipeline to extract multi-shot videos, captions, cross-shot grounding signals and reference images. Our framework leverages the intrinsic architectural properties to support multi-shot video generation, featuring text-driven inter-shot consistency, customized subject with motion control, and background-driven customized scene. Both shot count and duration are flexibly configurable. Extensive experiments demonstrate the superior performance and outstanding controllability of our framework.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes