CVLGNENov 7, 2016

Memory-augmented Attention Modelling for Videos

arXiv:1611.02261v420 citations
Originality Incremental advance
AI Analysis

This addresses the problem of generating accurate video descriptions for applications like accessibility or video indexing, though it appears incremental as it builds on existing attention-based methods.

The paper tackles video description generation by modeling higher-order interactions between frames and concepts using memory-augmented attention, resulting in improved performance on MSVD and Charades datasets without external temporal features.

We present a method to improve video description generation by modeling higher-order interactions between video frames and described concepts. By storing past visual attention in the video associated to previously generated words, the system is able to decide what to look at and describe in light of what it has already looked at and described. This enables not only more effective local attention, but tractable consideration of the video sequence while generating each word. Evaluation on the challenging and popular MSVD and Charades datasets demonstrates that the proposed architecture outperforms previous video description approaches without requiring external temporal video features.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes