CLCVApr 20, 2021

GraghVQA: Language-Guided Graph Neural Networks for Graph-based Visual Question Answering

arXiv:2104.10283v2732 citations
Originality Incremental advance
AI Analysis

This work addresses visual question answering for AI systems by improving accuracy on structured image representations, though it is incremental as it builds on existing graph neural network methods.

The authors tackled the problem of visual question answering on scene graphs by proposing GraphVQA, a language-guided graph neural network framework that translates questions into message passing iterations, achieving a state-of-the-art accuracy of 94.78% on the GQA dataset compared to 88.43%.

Images are more than a collection of objects or attributes -- they represent a web of relationships among interconnected objects. Scene Graph has emerged as a new modality for a structured graphical representation of images. Scene Graph encodes objects as nodes connected via pairwise relations as edges. To support question answering on scene graphs, we propose GraphVQA, a language-guided graph neural network framework that translates and executes a natural language question as multiple iterations of message passing among graph nodes. We explore the design space of GraphVQA framework, and discuss the trade-off of different design choices. Our experiments on GQA dataset show that GraphVQA outperforms the state-of-the-art model by a large margin (88.43% vs. 94.78%).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes