CVSep 23, 2020

Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering

arXiv:2009.11118v13 citations
Originality Incremental advance
AI Analysis

This work addresses visual question answering for AI systems, but it is incremental as it builds on existing joint modality methods.

The paper tackles the problem of improving visual question answering by using question-type prior knowledge to constrain the answer search space, resulting in state-of-the-art performance on VQA 2.0 and TDIUC benchmarks.

Different approaches have been proposed to Visual Question Answering (VQA). However, few works are aware of the behaviors of varying joint modality methods over question type prior knowledge extracted from data in constraining answer search space, of which information gives a reliable cue to reason about answers for questions asked in input images. In this paper, we propose a novel VQA model that utilizes the question-type prior information to improve VQA by leveraging the multiple interactions between different joint modality methods based on their behaviors in answering questions from different types. The solid experiments on two benchmark datasets, i.e., VQA 2.0 and TDIUC, indicate that the proposed method yields the best performance with the most competitive approaches.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes