CVNov 18, 2022

Contrastive Losses Are Natural Criteria for Unsupervised Video Summarization

arXiv:2211.10056v117 citationsh-index: 26Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of inefficient heuristic objectives in unsupervised video summarization for video browsing, offering an incremental improvement with a novel application of contrastive learning.

The paper tackles unsupervised video summarization by proposing contrastive losses to directly quantify frame-level importance, achieving competitive or better performance than heavily-trained methods with pre-trained features and further improving scores through a lightweight projection module.

Video summarization aims to select the most informative subset of frames in a video to facilitate efficient video browsing. Unsupervised methods usually rely on heuristic training objectives such as diversity and representativeness. However, such methods need to bootstrap the online-generated summaries to compute the objectives for importance score regression. We consider such a pipeline inefficient and seek to directly quantify the frame-level importance with the help of contrastive losses in the representation learning literature. Leveraging the contrastive losses, we propose three metrics featuring a desirable key frame: local dissimilarity, global consistency, and uniqueness. With features pre-trained on the image classification task, the metrics can already yield high-quality importance scores, demonstrating competitive or better performance than past heavily-trained methods. We show that by refining the pre-trained features with a lightweight contrastively learned projection module, the frame-level importance scores can be further improved, and the model can also leverage a large number of random videos and generalize to test videos with decent performance. Code available at https://github.com/pangzss/pytorch-CTVSUM.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes