CV LG NENov 7, 2016

Memory-augmented Attention Modelling for Videos

Rasool Fakoor, Abdel-rahman Mohamed, Margaret Mitchell, Sing Bing Kang, Pushmeet Kohli

arXiv:1611.02261v46.020 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the problem of generating accurate video descriptions for applications like accessibility or video indexing, though it appears incremental as it builds on existing attention-based methods.

The paper tackles video description generation by modeling higher-order interactions between frames and concepts using memory-augmented attention, resulting in improved performance on MSVD and Charades datasets without external temporal features.

We present a method to improve video description generation by modeling higher-order interactions between video frames and described concepts. By storing past visual attention in the video associated to previously generated words, the system is able to decide what to look at and describe in light of what it has already looked at and described. This enables not only more effective local attention, but tractable consideration of the video sequence while generating each word. Evaluation on the challenging and popular MSVD and Charades datasets demonstrates that the proposed architecture outperforms previous video description approaches without requiring external temporal video features.

View on arXiv PDF Code

Similar