AIJul 12, 2021

Zero-shot Visual Question Answering using Knowledge Graph

arXiv:2107.05348v489 citations
Originality Incremental advance
AI Analysis

This addresses the need for robust VQA systems that handle answer bias and error propagation in real-world applications, though it is incremental as it builds on existing knowledge graph and zero-shot approaches.

The paper tackles the problem of zero-shot visual question answering with unseen answers by proposing a method using knowledge graphs and mask-based learning, achieving state-of-the-art performance on zero-shot splits and improving existing models on the normal F-VQA task.

Incorporating external knowledge to Visual Question Answering (VQA) has become a vital practical need. Existing methods mostly adopt pipeline approaches with different components for knowledge matching and extraction, feature learning, etc.However, such pipeline approaches suffer when some component does not perform well, which leads to error propagation and poor overall performance. Furthermore, the majority of existing approaches ignore the answer bias issue -- many answers may have never appeared during training (i.e., unseen answers) in real-word application. To bridge these gaps, in this paper, we propose a Zero-shot VQA algorithm using knowledge graphs and a mask-based learning mechanism for better incorporating external knowledge, and present new answer-based Zero-shot VQA splits for the F-VQA dataset. Experiments show that our method can achieve state-of-the-art performance in Zero-shot VQA with unseen answers, meanwhile dramatically augment existing end-to-end models on the normal F-VQA task.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes