CL LGDec 31, 2020

Seeing is Knowing! Fact-based Visual Question Answering using Knowledge Graph Embeddings

arXiv:2012.15484v20.711 citations

Originality Highly original

AI Analysis

This work is significant for VQA systems that need to reason over incomplete knowledge graphs, which is a common problem in real-world applications.

The paper addresses Fact-based Visual Question Answering (FVQA) with incomplete knowledge graphs (KGs). It proposes a novel architecture using KG Embeddings and an 'Image-as-Knowledge' representation, achieving comparable performance to SOTA in standard answer retrieval and a 26% absolute improvement in a missing-edge reasoning task.

Fact-based Visual Question Answering (FVQA), a challenging variant of VQA, requires a QA-system to include facts from a diverse knowledge graph (KG) in its reasoning process to produce an answer. Large KGs, especially common-sense KGs, are known to be incomplete, i.e., not all non-existent facts are always incorrect. Therefore, being able to reason over incomplete KGs for QA is a critical requirement in real-world applications that has not been addressed extensively in the literature. We develop a novel QA architecture that allows us to reason over incomplete KGs, something current FVQA state-of-the-art (SOTA) approaches lack due to their critical reliance on fact retrieval. We use KG Embeddings, a technique widely used for KG completion, for the downstream task of FVQA. We also employ a new image representation technique we call 'Image-as-Knowledge' to enable this capability, alongside a simple one-step CoAttention mechanism to attend to text and image during QA. Our FVQA architecture is faster during inference time, being O(m), as opposed to existing FVQA SOTA methods which are O(N log N), where m = number of vertices, N = number of edges = O(m^2). KG embeddings are shown to hold complementary information to word embeddings: a combination of both metrics permits performance comparable to SOTA methods in the standard answer retrieval task, and significantly better (26% absolute) in the proposed missing-edge reasoning task.

View on arXiv PDF

Similar