CVCLIRJun 23, 2014

VideoSET: Video Summary Evaluation through Text

arXiv:1406.5824v174 citations
Originality Synthesis-oriented
AI Analysis

This addresses the need for better evaluation metrics in video summarization for the computer vision community, though it is incremental as it adapts existing NLP techniques to a new domain.

The paper tackles the problem of evaluating video summaries by proposing VideoSET, a text-based method that measures semantic retention compared to human-written ground-truth summaries, showing higher agreement with human judgment than pixel-based metrics.

In this paper we present VideoSET, a method for Video Summary Evaluation through Text that can evaluate how well a video summary is able to retain the semantic information contained in its original video. We observe that semantics is most easily expressed in words, and develop a text-based approach for the evaluation. Given a video summary, a text representation of the video summary is first generated, and an NLP-based metric is then used to measure its semantic distance to ground-truth text summaries written by humans. We show that our technique has higher agreement with human judgment than pixel-based distance metrics. We also release text annotations and ground-truth text summaries for a number of publicly available video datasets, for use by the computer vision community.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes