CVJan 14, 2016

Dynamic Concept Composition for Zero-Example Event Detection

arXiv:1601.03679v151 citations
Originality Incremental advance
AI Analysis

This addresses the problem of detecting events in videos without visual training data for applications in video analysis, though it is incremental as it builds on existing zero-shot learning methods.

The paper tackles zero-example event detection in videos by learning optimal weights for concept classifiers per video, using online videos with text descriptions, and reports superior results on TRECVID MEDTest 2014, MEDTest 2013, and CCV datasets.

In this paper, we focus on automatically detecting events in unconstrained videos without the use of any visual training exemplars. In principle, zero-shot learning makes it possible to train an event detection model based on the assumption that events (e.g. \emph{birthday party}) can be described by multiple mid-level semantic concepts (e.g. "blowing candle", "birthday cake"). Towards this goal, we first pre-train a bundle of concept classifiers using data from other sources. Then we evaluate the semantic correlation of each concept \wrt the event of interest and pick up the relevant concept classifiers, which are applied on all test videos to get multiple prediction score vectors. While most existing systems combine the predictions of the concept classifiers with fixed weights, we propose to learn the optimal weights of the concept classifiers for each testing video by exploring a set of online available videos with free-form text descriptions of their content. To validate the effectiveness of the proposed approach, we have conducted extensive experiments on the latest TRECVID MEDTest 2014, MEDTest 2013 and CCV dataset. The experimental results confirm the superiority of the proposed approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes