CLOct 7, 2022

Retrieval Augmented Visual Question Answering with Outside Knowledge

arXiv:2210.03809v2334 citationsh-index: 26
Originality Highly original
AI Analysis

This work addresses the challenge of retrieving external knowledge for visual question answering, offering improvements in answer quality and computational efficiency for the VQA community.

The paper tackles the problem of Outside-Knowledge Visual Question Answering (OK-VQA) by proposing a joint training scheme that integrates differentiable Dense Passage Retrieval (DPR) with answer generation, outperforming recent systems and reducing the number of retrieved documents needed in training.

Outside-Knowledge Visual Question Answering (OK-VQA) is a challenging VQA task that requires retrieval of external knowledge to answer questions about images. Recent OK-VQA systems use Dense Passage Retrieval (DPR) to retrieve documents from external knowledge bases, such as Wikipedia, but with DPR trained separately from answer generation, introducing a potential limit on the overall system performance. Instead, we propose a joint training scheme which includes differentiable DPR integrated with answer generation so that the system can be trained in an end-to-end fashion. Our experiments show that our scheme outperforms recent OK-VQA systems with strong DPR for retrieval. We also introduce new diagnostic metrics to analyze how retrieval and generation interact. The strong retrieval ability of our model significantly reduces the number of retrieved documents needed in training, yielding significant benefits in answer quality and computation required for training.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes