CVAug 19, 2020

Learning Trailer Moments in Full-Length Movies

arXiv:2008.08502v161 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient movie browsing for audiences by enabling automated key moment detection, though it is incremental as it builds on existing weak supervision methods.

The paper tackles the problem of detecting key moments in full-length movies without human annotations by using officially-released trailers as weak supervision, achieving superior performance over supervised approaches on a newly constructed dataset and improving state-of-the-art results on public benchmarks.

A movie's key moments stand out of the screenplay to grab an audience's attention and make movie browsing efficient. But a lack of annotations makes the existing approaches not applicable to movie key moment detection. To get rid of human annotations, we leverage the officially-released trailers as the weak supervision to learn a model that can detect the key moments from full-length movies. We introduce a novel ranking network that utilizes the Co-Attention between movies and trailers as guidance to generate the training pairs, where the moments highly corrected with trailers are expected to be scored higher than the uncorrelated moments. Additionally, we propose a Contrastive Attention module to enhance the feature representations such that the comparative contrast between features of the key and non-key moments are maximized. We construct the first movie-trailer dataset, and the proposed Co-Attention assisted ranking network shows superior performance even over the supervised approach. The effectiveness of our Contrastive Attention module is also demonstrated by the performance improvement over the state-of-the-art on the public benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes