MMCVAug 1, 2018

From Thumbnails to Summaries - A single Deep Neural Network to Rule Them All

arXiv:1808.00184v114 citations
Originality Incremental advance
AI Analysis

This addresses the need for efficient video summary generation for content creators, viewers, and advertisers, though it appears incremental as it builds on existing autoencoder and LSTM methods.

The paper tackles the problem of generating multiple forms of video summaries, such as thumbnails and storyboards, by proposing ReconstSum, an LSTM-based autoencoder framework that selects sparse subsets of frames to represent videos in an unsupervised manner, and demonstrates it outperforms state-of-the-art techniques in these use cases.

Video summaries come in many forms, from traditional single-image thumbnails, animated thumbnails, storyboards, to trailer-like video summaries. Content creators use the summaries to display the most attractive portion of their videos; the users use them to quickly evaluate if a video is worth watching. All forms of summaries are essential to video viewers, content creators, and advertisers. Often video content management systems have to generate multiple versions of summaries that vary in duration and presentational forms. We present a framework ReconstSum that utilizes LSTM-based autoencoder architecture to extract and select a sparse subset of video frames or keyshots that optimally represent the input video in an unsupervised manner. The encoder selects a subset from the input video while the decoder seeks to reconstruct the video from the selection. The goal is to minimize the difference between the original input video and the reconstructed video. Our method is easily extendable to generate a variety of applications including static video thumbnails, animated thumbnails, storyboards and "trailer-like" highlights. We specifically study and evaluate two most popular use cases: thumbnail generation and storyboard generation. We demonstrate that our methods generate better results than the state-of-the-art techniques in both use cases.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes