Intrinsic Subgraph Generation for Interpretable Graph based Visual Question Answering
This work addresses the need for interpretable AI methods in VQA, offering an intrinsic approach that bridges the gap between explainability and performance, though it is incremental as it builds on existing graph-based VQA techniques.
The authors tackled the problem of explainability in graph-based visual question answering by introducing an interpretable model that intrinsically generates explanatory subgraphs during the answer prediction process, achieving competitive performance on the GQA dataset.
The large success of deep learning based methods in Visual Question Answering (VQA) has concurrently increased the demand for explainable methods. Most methods in Explainable Artificial Intelligence (XAI) focus on generating post-hoc explanations rather than taking an intrinsic approach, the latter characterizing an interpretable model. In this work, we introduce an interpretable approach for graph-based VQA and demonstrate competitive performance on the GQA dataset. This approach bridges the gap between interpretability and performance. Our model is designed to intrinsically produce a subgraph during the question-answering process as its explanation, providing insight into the decision making. To evaluate the quality of these generated subgraphs, we compare them against established post-hoc explainability methods for graph neural networks, and perform a human evaluation. Moreover, we present quantitative metrics that correlate with the evaluations of human assessors, acting as automatic metrics for the generated explanatory subgraphs. Our implementation is available at https://github.com/DigitalPhonetics/Intrinsic-Subgraph-Generation-for-VQA.