CL CVApr 19, 2018

Video based Contextual Question Answering

Akash Ganesan, Divyansh Pal, Karthik Muthuraman, Shubham Dash

arXiv:1804.07399v10.52 citations

Originality Incremental advance

AI Analysis

This addresses the need for contextual understanding in videos for applications like surveillance or content analysis, but it appears incremental as it generalizes existing image methods to video.

The paper tackles the problem of extending image-based question answering to videos by proposing a graphical representation to handle spatial and temporal queries, such as locating objects and describing actions across video frames.

The primary aim of this project is to build a contextual Question-Answering model for videos. The current methodologies provide a robust model for image based Question-Answering, but we are aim to generalize this approach to be videos. We propose a graphical representation of video which is able to handle several types of queries across the whole video. For example, if a frame has an image of a man and a cat sitting, it should be able to handle queries like, where is the cat sitting with respect to the man? or ,what is the man holding in his hand?. It should be able to answer queries relating to temporal relationships also.

View on arXiv PDF

Similar