CVJan 12, 2023

Learning to Summarize Videos by Contrasting Clips

arXiv:2301.05213v31 citationsh-index: 65
Originality Incremental advance
AI Analysis

This addresses the problem of scalable video summarization for applications needing efficient content extraction, but it is incremental as it builds on existing contrastive methods.

The paper tackles unsupervised video summarization by proposing a contrastive learning method that contrasts top-k features instead of mean features, enabling meaningful and diverse summaries without labeled data.

Video summarization aims at choosing parts of a video that narrate a story as close as possible to the original one. Most of the existing video summarization approaches focus on hand-crafted labels. As the number of videos grows exponentially, there emerges an increasing need for methods that can learn meaningful summarizations without labeled annotations. In this paper, we aim to maximally exploit unsupervised video summarization while concentrating the supervision to a few, personalized labels as an add-on. To do so, we formulate the key requirements for the informative video summarization. Then, we propose contrastive learning as the answer to both questions. To further boost Contrastive video Summarization (CSUM), we propose to contrast top-k features instead of a mean video feature as employed by the existing method, which we implement with a differentiable top-k feature selector. Our experiments on several benchmarks demonstrate, that our approach allows for meaningful and diverse summaries when no labeled data is provided.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes