CLCVOct 18, 2022

Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual Question Answering

arXiv:2210.10176v2291 citationsh-index: 88
Originality Incremental advance
AI Analysis

This work addresses the challenge of retrieving specific external knowledge for visual question answering, which is important for improving accuracy in AI systems that require outside information, though it is incremental as it builds on existing two-stage frameworks.

The paper tackles the problem of inadequate knowledge retrieval in Outside-Knowledge Visual Question Answering (OK-VQA) by proposing an Entity-Focused Retrieval (EnFoRe) model, which achieves superior retrieval performance and sets a new state-of-the-art on the OK-VQA dataset.

Most Outside-Knowledge Visual Question Answering (OK-VQA) systems employ a two-stage framework that first retrieves external knowledge given the visual question and then predicts the answer based on the retrieved content. However, the retrieved knowledge is often inadequate. Retrievals are frequently too general and fail to cover specific knowledge needed to answer the question. Also, the naturally available supervision (whether the passage contains the correct answer) is weak and does not guarantee question relevancy. To address these issues, we propose an Entity-Focused Retrieval (EnFoRe) model that provides stronger supervision during training and recognizes question-relevant entities to help retrieve more specific knowledge. Experiments show that our EnFoRe model achieves superior retrieval performance on OK-VQA, the currently largest outside-knowledge VQA dataset. We also combine the retrieved knowledge with state-of-the-art VQA models, and achieve a new state-of-the-art performance on OK-VQA.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes