CVAICLLGJul 27, 2022

Uncertainty-based Visual Question Answering: Estimating Semantic Inconsistency between Image and Knowledge Base

arXiv:2207.13242v16 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses the challenge of effectively integrating external knowledge in KVQA for improved question answering, representing an incremental advance with specific gains.

The paper tackles the problem of irrelevant external knowledge confusing knowledge-based visual question answering (KVQA) by proposing a semantic inconsistency measure based on caption uncertainty and semantic similarity, and a new knowledge assimilation method. Their approach achieves state-of-the-art performance on the OK-VQA dataset.

Knowledge-based visual question answering (KVQA) task aims to answer questions that require additional external knowledge as well as an understanding of images and questions. Recent studies on KVQA inject an external knowledge in a multi-modal form, and as more knowledge is used, irrelevant information may be added and can confuse the question answering. In order to properly use the knowledge, this study proposes the following: 1) we introduce a novel semantic inconsistency measure computed from caption uncertainty and semantic similarity; 2) we suggest a new external knowledge assimilation method based on the semantic inconsistency measure and apply it to integrate explicit knowledge and implicit knowledge for KVQA; 3) the proposed method is evaluated with the OK-VQA dataset and achieves the state-of-the-art performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes