CVAINov 19, 2020

Logically Consistent Loss for Visual Question Answering

arXiv:2011.10094v1
AI Analysis

This work aims to improve the logical consistency of VQA models, which is a critical problem for making these systems more reliable and human-like in their reasoning, particularly for applications requiring robust and consistent responses.

This paper addresses the lack of logical consistency in neural-network based Visual Question Answering (VQA) models, which struggle to provide consistent answers across different question forms and semantic tasks. The authors propose a model-agnostic logic constraint formulated as a logically consistent loss within a multi-task learning framework, along with a data organization strategy called family-batch and hybrid-batch. Experiments with MAC-net based VQA machines show that the proposed loss and hybrid-batch lead to improved consistency and better performance.

Given an image, a back-ground knowledge, and a set of questions about an object, human learners answer the questions very consistently regardless of question forms and semantic tasks. The current advancement in neural-network based Visual Question Answering (VQA), despite their impressive performance, cannot ensure such consistency due to identically distribution (i.i.d.) assumption. We propose a new model-agnostic logic constraint to tackle this issue by formulating a logically consistent loss in the multi-task learning framework as well as a data organisation called family-batch and hybrid-batch. To demonstrate usefulness of this proposal, we train and evaluate MAC-net based VQA machines with and without the proposed logically consistent loss and the proposed data organization. The experiments confirm that the proposed loss formulae and introduction of hybrid-batch leads to more consistency as well as better performance. Though the proposed approach is tested with MAC-net, it can be utilised in any other QA methods whenever the logical consistency between answers exist.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes