Zero-shot Visual Question Answering using Knowledge Graph
This addresses the need for robust VQA systems that handle answer bias and error propagation in real-world applications, though it is incremental as it builds on existing knowledge graph and zero-shot approaches.
The paper tackles the problem of zero-shot visual question answering with unseen answers by proposing a method using knowledge graphs and mask-based learning, achieving state-of-the-art performance on zero-shot splits and improving existing models on the normal F-VQA task.
Incorporating external knowledge to Visual Question Answering (VQA) has become a vital practical need. Existing methods mostly adopt pipeline approaches with different components for knowledge matching and extraction, feature learning, etc.However, such pipeline approaches suffer when some component does not perform well, which leads to error propagation and poor overall performance. Furthermore, the majority of existing approaches ignore the answer bias issue -- many answers may have never appeared during training (i.e., unseen answers) in real-word application. To bridge these gaps, in this paper, we propose a Zero-shot VQA algorithm using knowledge graphs and a mask-based learning mechanism for better incorporating external knowledge, and present new answer-based Zero-shot VQA splits for the F-VQA dataset. Experiments show that our method can achieve state-of-the-art performance in Zero-shot VQA with unseen answers, meanwhile dramatically augment existing end-to-end models on the normal F-VQA task.