CVMMIVNov 20, 2020

SalSum: Saliency-based Video Summarization using Generative Adversarial Networks

arXiv:2011.10432v1
AI Analysis

This work provides a method for creating more perceptually compatible video summaries, which is beneficial for users dealing with large amounts of video data from surveillance, medical, and telecommunication systems.

This paper addresses the need for effective video summarization by proposing SalSum, a novel method based on a Generative Adversarial Network pre-trained with human eye fixations. SalSum achieves perceptually compatible video summaries by combining perceived color and spatiotemporal visual attention cues, outperforming state-of-the-art approaches with the highest f-measure score on the VSUMM benchmark.

The huge amount of video data produced daily by camera-based systems, such as surveilance, medical and telecommunication systems, emerges the need for effective video summarization (VS) methods. These methods should be capable of creating an overview of the video content. In this paper, we propose a novel VS method based on a Generative Adversarial Network (GAN) model pre-trained with human eye fixations. The main contribution of the proposed method is that it can provide perceptually compatible video summaries by combining both perceived color and spatiotemporal visual attention cues in a unsupervised scheme. Several fusion approaches are considered for robustness under uncertainty, and personalization. The proposed method is evaluated in comparison to state-of-the-art VS approaches on the benchmark dataset VSUMM. The experimental results conclude that SalSum outperforms the state-of-the-art approaches by providing the highest f-measure score on the VSUMM benchmark.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes