CVAICLMMMar 2, 2022

Recent, rapid advancement in visual question answering architecture: a review

arXiv:2203.01322v49 citationsh-index: 21
AI Analysis

This is an incremental review paper summarizing developments in VQA architectures for researchers in multimodal AI.

This paper reviews recent rapid advancements in visual question answering (VQA) system architectures, highlighting the importance of multimodal approaches and building upon prior work by Manmadhan et al. (2020) with subsequent updates.

Understanding visual question answering is going to be crucial for numerous human activities. However, it presents major challenges at the heart of the artificial intelligence endeavor. This paper presents an update on the rapid advancements in visual question answering using images that have occurred in the last couple of years. Tremendous growth in research on improving visual question answering system architecture has been published recently, showing the importance of multimodal architectures. Several points on the benefits of visual question answering are mentioned in the review paper by Manmadhan et al. (2020), on which the present article builds, including subsequent updates in the field.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes