CV AIJul 16, 2024

TM-PATHVQA:90000+ Textless Multilingual Questions for Medical Visual Question Answering

Tonmoy Rajkhowa, Amartya Roy Chowdhury, Sankalp Nagaonkar, Achyut Mani Tripathi

arXiv:2407.11383v13.73 citationsh-index: 10

Originality Synthesis-oriented

AI Analysis

This addresses the problem of hands-free interaction and accessibility in medical diagnostics for healthcare professionals, but it is incremental as it expands an existing dataset and applies known methods to a new modality.

The paper tackles the limitation of text-based medical visual question answering (VQA) systems by introducing a speech-based VQA system using a new dataset, TM-PATHVQA, which includes 98,397 multilingual spoken questions and answers based on 5,004 pathological images and 70 hours of audio, and benchmarks various acoustic and visual feature combinations.

In healthcare and medical diagnostics, Visual Question Answering (VQA) mayemergeasapivotal tool in scenarios where analysis of intricate medical images becomes critical for accurate diagnoses. Current text-based VQA systems limit their utility in scenarios where hands-free interaction and accessibility are crucial while performing tasks. A speech-based VQA system may provide a better means of interaction where information can be accessed while performing tasks simultaneously. To this end, this work implements a speech-based VQA system by introducing a Textless Multilingual Pathological VQA (TMPathVQA) dataset, an expansion of the PathVQA dataset, containing spoken questions in English, German & French. This dataset comprises 98,397 multilingual spoken questions and answers based on 5,004 pathological images along with 70 hours of audio. Finally, this work benchmarks and compares TMPathVQA systems implemented using various combinations of acoustic and visual features.

View on arXiv PDF

Similar