Generative AI for Video Trailer Synthesis: From Extractive Heuristics to Autoregressive Creativity

arXiv:2604.0495364.52 citationsh-index: 2
Predicted impact top 51% in CV · last 90 daysOriginality Synthesis-oriented
AI Analysis

For researchers and practitioners in video generation and multimedia, this survey provides a comprehensive technical review and taxonomy of generative AI for trailer synthesis, though it is a literature review rather than a novel contribution.

This survey reviews the paradigm shift in automatic video trailer generation from heuristic-based extraction to deep generative synthesis, analyzing techniques like autoregressive Transformers and text-to-video models (e.g., Sora, Veo). It establishes a new taxonomy and suggests future systems will move toward controllable generative editing.

The domain of automatic video trailer generation is currently undergoing a profound paradigm shift, transitioning from heuristic-based extraction methods to deep generative synthesis. While early methodologies relied heavily on low-level feature engineering, visual saliency, and rule-based heuristics to select representative shots, recent advancements in Large Language Models (LLMs), Multimodal Large Language Models (MLLMs), and diffusion-based video synthesis have enabled systems that not only identify key moments but also construct coherent, emotionally resonant narratives. This survey provides a comprehensive technical review of this evolution, with a specific focus on generative techniques including autoregressive Transformers, LLM-orchestrated pipelines, and text-to-video foundation models like OpenAI's Sora and Google's Veo. We analyze the architectural progression from Graph Convolutional Networks (GCNs) to Trailer Generation Transformers (TGT), evaluate the economic implications of automated content velocity on User-Generated Content (UGC) platforms, and discuss the ethical challenges posed by high-fidelity neural synthesis. By synthesizing insights from recent literature, this report establishes a new taxonomy for AI-driven trailer generation in the era of foundation models, suggesting that future promotional video systems will move beyond extractive selection toward controllable generative editing and semantic reconstruction of trailers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes