CVApr 9, 2018

Viewpoint-aware Video Summarization

Atsushi Kanehira, Luc Van Gool, Yoshitaka Ushiku, Tatsuya Harada

arXiv:1804.02843v26.339 citationsh-index: 167

Originality Incremental advance

AI Analysis

This addresses the need for personalized video summarization for users with grouped video collections, but it is incremental as it builds on existing summarization techniques with a new viewpoint-aware variant.

This paper tackles the problem of generating video summaries that depend on a viewer's specific focus or viewpoint, using semantic similarities within groups of videos and dissimilarities between groups to produce summaries that are diverse, representative, and discriminative. The proposed method, inspired by Fisher's discriminant criteria, optimizes variances in feature representations, and experiments on a novel dataset demonstrate its effectiveness, though no concrete numerical results are provided.

This paper introduces a novel variant of video summarization, namely building a summary that depends on the particular aspect of a video the viewer focuses on. We refer to this as $\textit{viewpoint}$. To infer what the desired $\textit{viewpoint}$ may be, we assume that several other videos are available, especially groups of videos, e.g., as folders on a person's phone or laptop. The semantic similarity between videos in a group vs. the dissimilarity between groups is used to produce $\textit{viewpoint}$-specific summaries. For considering similarity as well as avoiding redundancy, output summary should be (A) diverse, (B) representative of videos in the same group, and (C) discriminative against videos in the different groups. To satisfy these requirements (A)-(C) simultaneously, we proposed a novel video summarization method from multiple groups of videos. Inspired by Fisher's discriminant criteria, it selects summary by optimizing the combination of three terms (a) inner-summary, (b) inner-group, and (c) between-group variances defined on the feature representation of summary, which can simply represent (A)-(C). Moreover, we developed a novel dataset to investigate how well the generated summary reflects the underlying $\textit{viewpoint}$. Quantitative and qualitative experiments conducted on the dataset demonstrate the effectiveness of proposed method.

View on arXiv PDF

Similar