CVMay 5, 2017

Unified Embedding and Metric Learning for Zero-Exemplar Event Detection

arXiv:1705.02148v114 citations
Originality Incremental advance
AI Analysis

This addresses video retrieval for novel events without visual examples, representing an incremental improvement over existing concept-based approaches.

The paper tackles zero-exemplar event detection in videos by learning a joint embedding space for visual and textual representations, outperforming state-of-the-art methods on the TRECVID dataset.

Event detection in unconstrained videos is conceived as a content-based video retrieval with two modalities: textual and visual. Given a text describing a novel event, the goal is to rank related videos accordingly. This task is zero-exemplar, no video examples are given to the novel event. Related works train a bank of concept detectors on external data sources. These detectors predict confidence scores for test videos, which are ranked and retrieved accordingly. In contrast, we learn a joint space in which the visual and textual representations are embedded. The space casts a novel event as a probability of pre-defined events. Also, it learns to measure the distance between an event and its related videos. Our model is trained end-to-end on publicly available EventNet. When applied to TRECVID Multimedia Event Detection dataset, it outperforms the state-of-the-art by a considerable margin.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes