CVAIIRJul 4, 2023

Causal Video Summarizer for Video Exploration

NVIDIA
arXiv:2307.01947v110 citationsh-index: 47
Originality Highly original
AI Analysis

This work addresses the limitation of fixed video summaries in video exploration for users by providing a more interactive and query-dependent approach.

The paper tackles the problem of generating user-specific video summaries by proposing a causality-based method for multi-modal video summarization, which improves accuracy by +5.4% and F1-score by +4.92% compared to state-of-the-art methods.

Recently, video summarization has been proposed as a method to help video exploration. However, traditional video summarization models only generate a fixed video summary which is usually independent of user-specific needs and hence limits the effectiveness of video exploration. Multi-modal video summarization is one of the approaches utilized to address this issue. Multi-modal video summarization has a video input and a text-based query input. Hence, effective modeling of the interaction between a video input and text-based query is essential to multi-modal video summarization. In this work, a new causality-based method named Causal Video Summarizer (CVS) is proposed to effectively capture the interactive information between the video and query to tackle the task of multi-modal video summarization. The proposed method consists of a probabilistic encoder and a probabilistic decoder. Based on the evaluation of the existing multi-modal video summarization dataset, experimental results show that the proposed approach is effective with the increase of +5.4% in accuracy and +4.92% increase of F 1- score, compared with the state-of-the-art method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes