CVJul 3, 2023

Localized Questions in Medical Visual Question Answering

Sergio Tascon-Morales, Pablo Márquez-Neila, Raphael Sznitman

arXiv:2307.01067v15.012 citationsh-index: 35Has Code

Originality Highly original

AI Analysis

This addresses the problem of limited interpretability and probing ability in medical VQA for healthcare applications, representing a novel method for a known bottleneck.

The paper tackles the limitation of existing medical VQA models that cannot answer questions about specific image regions, proposing a novel approach that outperforms existing methods on three datasets.

Visual Question Answering (VQA) models aim to answer natural language questions about given images. Due to its ability to ask questions that differ from those used when training the model, medical VQA has received substantial attention in recent years. However, existing medical VQA models typically focus on answering questions that refer to an entire image rather than where the relevant content may be located in the image. Consequently, VQA models are limited in their interpretability power and the possibility to probe the model about specific image regions. This paper proposes a novel approach for medical VQA that addresses this limitation by developing a model that can answer questions about image regions while considering the context necessary to answer the questions. Our experimental results demonstrate the effectiveness of our proposed model, outperforming existing methods on three datasets. Our code and data are available at https://github.com/sergiotasconmorales/locvqa.

View on arXiv PDF Code

Similar