CVCLLGJul 11, 2019

Activitynet 2019 Task 3: Exploring Contexts for Dense Captioning Events in Videos

arXiv:1907.05092v113 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of generating accurate and diverse captions for multiple events in videos, which is important for video understanding applications, but it is incremental as it builds on existing methods with systematic exploration.

The paper tackled the problem of dense captioning events in long untrimmed videos by exploring different captioning models with various contexts, achieving a state-of-the-art performance with a 9.91 METEOR score on the challenge testing set.

Contextual reasoning is essential to understand events in long untrimmed videos. In this work, we systematically explore different captioning models with various contexts for the dense-captioning events in video task, which aims to generate captions for different events in the untrimmed video. We propose five types of contexts as well as two categories of event captioning models, and evaluate their contributions for event captioning from both accuracy and diversity aspects. The proposed captioning models are plugged into our pipeline system for the dense video captioning challenge. The overall system achieves the state-of-the-art performance on the dense-captioning events in video task with 9.91 METEOR score on the challenge testing set.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes