CVAIIRJul 4, 2023

Query-based Video Summarization with Pseudo Label Supervision

arXiv:2307.01945v117 citationsh-index: 47
Originality Incremental advance
AI Analysis

This work addresses data sparsity in video summarization for applications like video retrieval, but it is incremental as it builds on existing self-supervision methods.

The paper tackles the problem of limited labeled data for query-based video summarization by introducing segment-level pseudo labels and a semantics booster to improve query-dependent summaries, achieving state-of-the-art performance on three benchmarks.

Existing datasets for manually labelled query-based video summarization are costly and thus small, limiting the performance of supervised deep video summarization models. Self-supervision can address the data sparsity challenge by using a pretext task and defining a method to acquire extra data with pseudo labels to pre-train a supervised deep model. In this work, we introduce segment-level pseudo labels from input videos to properly model both the relationship between a pretext task and a target task, and the implicit relationship between the pseudo label and the human-defined label. The pseudo labels are generated based on existing human-defined frame-level labels. To create more accurate query-dependent video summaries, a semantics booster is proposed to generate context-aware query representations. Furthermore, we propose mutual attention to help capture the interactive information between visual and textual modalities. Three commonly-used video summarization benchmarks are used to thoroughly validate the proposed approach. Experimental results show that the proposed video summarization algorithm achieves state-of-the-art performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes