CVJul 16, 2023

Self-Attention Based Generative Adversarial Networks For Unsupervised Video Summarization

arXiv:2307.08145v114 citationsh-index: 43
Originality Incremental advance
AI Analysis

This work addresses video summarization for applications like content analysis, but it is incremental as it builds on existing GAN methods by adding self-attention.

The paper tackles unsupervised video summarization by proposing a GAN-based model with self-attention for frame selection, achieving state-of-the-art performance on SumMe and competitive results on TVSum and COGNIMUSE datasets.

In this paper, we study the problem of producing a comprehensive video summary following an unsupervised approach that relies on adversarial learning. We build on a popular method where a Generative Adversarial Network (GAN) is trained to create representative summaries, indistinguishable from the originals. The introduction of the attention mechanism into the architecture for the selection, encoding and decoding of video frames, shows the efficacy of self-attention and transformer in modeling temporal relationships for video summarization. We propose the SUM-GAN-AED model that uses a self-attention mechanism for frame selection, combined with LSTMs for encoding and decoding. We evaluate the performance of the SUM-GAN-AED model on the SumMe, TVSum and COGNIMUSE datasets. Experimental results indicate that using a self-attention mechanism as the frame selection mechanism outperforms the state-of-the-art on SumMe and leads to comparable to state-of-the-art performance on TVSum and COGNIMUSE.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes