CVApr 12

A Benchmark and Multi-Agent System for Instruction-driven Cinematic Video Compilation

arXiv:2604.1045697.51 citationsh-index: 12
Predicted impact top 4% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the lack of a benchmark and effective system for automatic cinematic video compilation, benefiting video editors and content creators.

CineBench is the first benchmark for instruction-driven cinematic video compilation, and CineAgents, a multi-agent system, outperforms existing methods in narrative and logical coherence.

The surging demand for adapting long-form cinematic content into short videos has motivated the need for versatile automatic video compilation systems. However, existing compilation methods are limited to predefined tasks, and the community lacks a comprehensive benchmark to evaluate the cinematic compilation. To address this, we introduce CineBench, the first benchmark for instruction-driven cinematic video compilation, featuring diverse user instructions and high-quality ground-truth compilations annotated by professional editors. To overcome contextual collapse and temporal fragmentation, we present CineAgents, a multi-agent system that reformulates cinematic video compilation into ``design-and-compose'' paradigm. CineAgents performs script reverse-engineering to construct a hierarchical narrative memory to provide multi-level context and employs an iterative narrative planning process that refines a creative blueprint into a final compiled script. Extensive experiments demonstrate that CineAgents significantly outperforms existing methods, generating compilations with superior narrative coherence and logical coherence.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes