CVApr 12

A Benchmark and Multi-Agent System for Instruction-driven Cinematic Video Compilation

Peixuan Zhang, Chang Zhou, Ziyuan Zhang, Hualuo Liu, Chunjie Zhang, Jingqi Liu, Xiaohui Zhou, Xi Chen, Shuchen Weng, Si Li, Boxin Shi

arXiv:2604.1045697.51 citationsh-index: 12

Predicted impact top 4% in CV · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the lack of a benchmark and effective system for automatic cinematic video compilation, benefiting video editors and content creators.

CineBench is the first benchmark for instruction-driven cinematic video compilation, and CineAgents, a multi-agent system, outperforms existing methods in narrative and logical coherence.

The surging demand for adapting long-form cinematic content into short videos has motivated the need for versatile automatic video compilation systems. However, existing compilation methods are limited to predefined tasks, and the community lacks a comprehensive benchmark to evaluate the cinematic compilation. To address this, we introduce CineBench, the first benchmark for instruction-driven cinematic video compilation, featuring diverse user instructions and high-quality ground-truth compilations annotated by professional editors. To overcome contextual collapse and temporal fragmentation, we present CineAgents, a multi-agent system that reformulates cinematic video compilation into ``design-and-compose'' paradigm. CineAgents performs script reverse-engineering to construct a hierarchical narrative memory to provide multi-level context and employs an iterative narrative planning process that refines a creative blueprint into a final compiled script. Extensive experiments demonstrate that CineAgents significantly outperforms existing methods, generating compilations with superior narrative coherence and logical coherence.

View on arXiv PDF

Similar