CVCLLGDec 5, 2018

Summarizing Videos with Attention

arXiv:1812.01969v2242 citations
AI Analysis

This work addresses the problem of efficient and effective video summarization for researchers and practitioners, though it is incremental as it builds on existing attention mechanisms.

The authors tackled video summarization by proposing a self-attention-based network that simplifies implementation and reduces computational demands compared to existing methods, achieving new state-of-the-art results on the TvSum and SumMe benchmarks.

In this work we propose a novel method for supervised, keyshots based video summarization by applying a conceptually simple and computationally efficient soft, self-attention mechanism. Current state of the art methods leverage bi-directional recurrent networks such as BiLSTM combined with attention. These networks are complex to implement and computationally demanding compared to fully connected networks. To that end we propose a simple, self-attention based network for video summarization which performs the entire sequence to sequence transformation in a single feed forward pass and single backward pass during training. Our method sets a new state of the art results on two benchmarks TvSum and SumMe, commonly used in this domain.

Code Implementations5 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes