CVAICLLGSep 17, 2019

Inverse Visual Question Answering with Multi-Level Attentions

arXiv:1909.07583v2
AI Analysis

This addresses the problem of generating questions from images and answers for AI systems, representing an incremental improvement in a specific domain.

The paper tackles inverse visual question answering by proposing a deep multi-level attention model that generates regional visual and semantic features at the object level and enhances them with answer cues, achieving state-of-the-art performance on the VQA V1 dataset across multiple metrics.

In this paper, we propose a novel deep multi-level attention model to address inverse visual question answering. The proposed model generates regional visual and semantic features at the object level and then enhances them with the answer cue by using attention mechanisms. Two levels of multiple attentions are employed in the model, including the dual attention at the partial question encoding step and the dynamic attention at the next question word generation step. We evaluate the proposed model on the VQA V1 dataset. It demonstrates state-of-the-art performance in terms of multiple commonly used metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes