CLCVApr 19, 2018

Video based Contextual Question Answering

arXiv:1804.07399v12 citations
Originality Incremental advance
AI Analysis

This addresses the need for contextual understanding in videos for applications like surveillance or content analysis, but it appears incremental as it generalizes existing image methods to video.

The paper tackles the problem of extending image-based question answering to videos by proposing a graphical representation to handle spatial and temporal queries, such as locating objects and describing actions across video frames.

The primary aim of this project is to build a contextual Question-Answering model for videos. The current methodologies provide a robust model for image based Question-Answering, but we are aim to generalize this approach to be videos. We propose a graphical representation of video which is able to handle several types of queries across the whole video. For example, if a frame has an image of a man and a cat sitting, it should be able to handle queries like, where is the cat sitting with respect to the man? or ,what is the man holding in his hand?. It should be able to answer queries relating to temporal relationships also.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes