CVJun 14, 2018

From Trailers to Storylines: An Efficient Way to Learn from Movies

arXiv:1806.05341v126 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of high computational costs in movie-based computer vision for researchers, though it is incremental as it builds on existing methods by splitting learning across different data sources.

The paper tackles the computational challenge of learning vision models from full-length movies by proposing a framework that learns visual features from trailers and temporal structures from movies, achieving substantial training time reduction and effective feature extraction.

The millions of movies produced in the human history are valuable resources for computer vision research. However, learning a vision model from movie data would meet with serious difficulties. A major obstacle is the computational cost -- the length of a movie is often over one hour, which is substantially longer than the short video clips that previous study mostly focuses on. In this paper, we explore an alternative approach to learning vision models from movies. Specifically, we consider a framework comprised of a visual module and a temporal analysis module. Unlike conventional learning methods, the proposed approach learns these modules from different sets of data -- the former from trailers while the latter from movies. This allows distinctive visual features to be learned within a reasonable budget while still preserving long-term temporal structures across an entire movie. We construct a large-scale dataset for this study and define a series of tasks on top. Experiments on this dataset showed that the proposed method can substantially reduce the training time while obtaining highly effective features and coherent temporal structures.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes