CVJul 16, 2017

Query-Focused Video Summarization: Dataset, Evaluation, and A Memory Network Based Approach

arXiv:1707.04960v1152 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of personalized video summarization for users with diverse preferences, offering a novel method and evaluation framework, though it is incremental in building upon existing query-focused techniques.

The paper tackles user subjectivity in video summarization by introducing a query-focused approach that incorporates user preferences via text queries, achieving improved performance over existing methods. It also proposes a new dataset with dense concept annotations and an evaluation metric based on semantic information, which addresses the challenge of evaluating summarizers effectively.

Recent years have witnessed a resurgence of interest in video summarization. However, one of the main obstacles to the research on video summarization is the user subjectivity - users have various preferences over the summaries. The subjectiveness causes at least two problems. First, no single video summarizer fits all users unless it interacts with and adapts to the individual users. Second, it is very challenging to evaluate the performance of a video summarizer. To tackle the first problem, we explore the recently proposed query-focused video summarization which introduces user preferences in the form of text queries about the video into the summarization process. We propose a memory network parameterized sequential determinantal point process in order to attend the user query onto different video frames and shots. To address the second challenge, we contend that a good evaluation metric for video summarization should focus on the semantic information that humans can perceive rather than the visual features or temporal overlaps. To this end, we collect dense per-video-shot concept annotations, compile a new dataset, and suggest an efficient evaluation method defined upon the concept annotations. We conduct extensive experiments contrasting our video summarizer to existing ones and present detailed analyses about the dataset and the new evaluation method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes