CVMar 31, 2018

Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning

arXiv:1804.00100v2233 citations
Originality Incremental advance
AI Analysis

This work improves dense video captioning for video analysis applications, though it is incremental as it builds on existing methods with specific enhancements.

The paper tackles dense video captioning by addressing challenges in utilizing past and future contexts for event proposals and distinguishing events with similar timing, resulting in a model that outperforms state-of-the-art methods with a Meteor score increase from 4.82 to 9.65 on the ActivityNet Captions dataset.

Dense video captioning is a newly emerging task that aims at both localizing and describing all events in a video. We identify and tackle two challenges on this task, namely, (1) how to utilize both past and future contexts for accurate event proposal predictions, and (2) how to construct informative input to the decoder for generating natural event descriptions. First, previous works predominantly generate temporal event proposals in the forward direction, which neglects future video context. We propose a bidirectional proposal method that effectively exploits both past and future contexts to make proposal predictions. Second, different events ending at (nearly) the same time are indistinguishable in the previous works, resulting in the same captions. We solve this problem by representing each event with an attentive fusion of hidden states from the proposal module and video contents (e.g., C3D features). We further propose a novel context gating mechanism to balance the contributions from the current event and its surrounding contexts dynamically. We empirically show that our attentively fused event representation is superior to the proposal hidden states or video contents alone. By coupling proposal and captioning modules into one unified framework, our model outperforms the state-of-the-arts on the ActivityNet Captions dataset with a relative gain of over 100% (Meteor score increases from 4.82 to 9.65).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes