CVCLJul 23, 2018

Question Relevance in Visual Question Answering

arXiv:1807.08435v17 citations
Originality Incremental advance
AI Analysis

This work addresses the issue of nonsensical answers in VQA systems when posed with irrelevant questions, which is an incremental improvement for enhancing reliability in AI vision-language tasks.

The paper tackles the problem of irrelevant questions in Visual Question Answering (VQA) by identifying question relevance to images, addressing it through a two-step approach: determining if a question is visual and then assessing its relevance, with results compared across models like LSTM RNN, Logistic Regression, XGBoost, and multi-layer perceptron.

Free-form and open-ended Visual Question Answering systems solve the problem of providing an accurate natural language answer to a question pertaining to an image. Current VQA systems do not evaluate if the posed question is relevant to the input image and hence provide nonsensical answers when posed with irrelevant questions to an image. In this paper, we solve the problem of identifying the relevance of the posed question to an image. We address the problem as two sub-problems. We first identify if the question is visual or not. If the question is visual, we then determine if it's relevant to the image or not. For the second problem, we generate a large dataset from existing visual question answering datasets in order to enable the training of complex architectures and model the relevance of a visual question to an image. We also compare the results of our Long Short-Term Memory Recurrent Neural Network based models to Logistic Regression, XGBoost and multi-layer perceptron based approaches to the problem.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes