CLCVDec 14, 2020

Movie Summarization via Sparse Graph Construction

arXiv:2012.07536v137 citations
AI Analysis

This work addresses the challenge of automatically summarizing long video content, specifically movies, for viewers who want to quickly grasp the storyline, offering an incremental improvement over existing methods.

This paper tackles the problem of summarizing full-length movies into shorter videos by identifying and assembling 'turning point' scenes. The proposed model, which constructs a sparse movie graph using multimodal information, generates summaries that human judges rate as more informative and complete than those from sequence-based and general-purpose summarization algorithms.

We summarize full-length movies by creating shorter videos containing their most informative scenes. We explore the hypothesis that a summary can be created by assembling scenes which are turning points (TPs), i.e., key events in a movie that describe its storyline. We propose a model that identifies TP scenes by building a sparse movie graph that represents relations between scenes and is constructed using multimodal information. According to human judges, the summaries created by our approach are more informative and complete, and receive higher ratings, than the outputs of sequence-based models and general-purpose summarization algorithms. The induced graphs are interpretable, displaying different topology for different movie genres.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes