CVAICLJul 4, 2017

DeepStory: Video Story QA by Deep Embedded Memory Networks

arXiv:1707.00836v1183 citations
Originality Incremental advance
AI Analysis

This addresses video QA for AI agents, enabling better understanding of real-world video content, though it is incremental as it builds on existing memory and attention methods.

The paper tackles video story question-answering by developing Deep Embedded Memory Networks (DEMN), which reconstruct stories from joint scene-dialogue streams and use attention for QA, achieving state-of-the-art results on the MovieQA benchmark and outperforming other models on a novel Pororo dataset.

Question-answering (QA) on video contents is a significant challenge for achieving human-level intelligence as it involves both vision and language in real-world settings. Here we demonstrate the possibility of an AI agent performing video story QA by learning from a large amount of cartoon videos. We develop a video-story learning model, i.e. Deep Embedded Memory Networks (DEMN), to reconstruct stories from a joint scene-dialogue video stream using a latent embedding space of observed data. The video stories are stored in a long-term memory component. For a given question, an LSTM-based attention model uses the long-term memory to recall the best question-story-answer triplet by focusing on specific words containing key information. We trained the DEMN on a novel QA dataset of children's cartoon video series, Pororo. The dataset contains 16,066 scene-dialogue pairs of 20.5-hour videos, 27,328 fine-grained sentences for scene description, and 8,913 story-related QA pairs. Our experimental results show that the DEMN outperforms other QA models. This is mainly due to 1) the reconstruction of video stories in a scene-dialogue combined form that utilize the latent embedding and 2) attention. DEMN also achieved state-of-the-art results on the MovieQA benchmark.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes