CVSep 28, 2016

Video Summarization using Deep Semantic Features

Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Naokazu Yokoya

arXiv:1609.08758v112.4130 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of quickly overviewing content for users of Internet videos, but it is incremental as it builds on existing clustering-based summarization methods.

The paper tackled the challenge of summarizing diverse Internet videos by using deep semantic features to encode objects, actions, and scenes, resulting in improved efficiency over standard techniques as demonstrated on the SumMe dataset.

This paper presents a video summarization technique for an Internet video to provide a quick way to overview its content. This is a challenging problem because finding important or informative parts of the original video requires to understand its content. Furthermore the content of Internet videos is very diverse, ranging from home videos to documentaries, which makes video summarization much more tough as prior knowledge is almost not available. To tackle this problem, we propose to use deep video features that can encode various levels of content semantics, including objects, actions, and scenes, improving the efficiency of standard video summarization techniques. For this, we design a deep neural network that maps videos as well as descriptions to a common semantic space and jointly trained it with associated pairs of videos and descriptions. To generate a video summary, we extract the deep features from each segment of the original video and apply a clustering-based summarization technique to them. We evaluate our video summaries using the SumMe dataset as well as baseline approaches. The results demonstrated the advantages of incorporating our deep semantic features in a video summarization technique.

View on arXiv PDF

Similar