MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale
This work addresses the need for more interpretable AI in healthcare to assist medical experts in faster and more accurate diagnoses, though it is incremental as it builds on existing MedVQA datasets and methods.
The paper tackles the problem of limited interpretability and transparency in Medical Visual Question Answering (MedVQA) by introducing new benchmark datasets with decision-making rationales and a novel framework, MedThink, which achieves accuracies of 83.5% on R-RAD, 86.3% on R-SLAKE, and 87.2% on R-Path, significantly outperforming existing state-of-the-art models.
Medical Visual Question Answering (MedVQA), which offers language responses to image-based medical inquiries, represents a challenging task and significant advancement in healthcare. It assists medical experts to swiftly interpret medical images, thereby enabling faster and more accurate diagnoses. However, the model interpretability and transparency of existing MedVQA solutions are often limited, posing challenges in understanding their decision-making processes. To address this issue, we devise a semi-automated annotation process to streamline data preparation and build new benchmark MedVQA datasets R-RAD, R-SLAKE and R-Path. These datasets provide intermediate medical decision-making rationales generated by multimodal large language models and human annotations for question-answering pairs in existing MedVQA datasets, i.e., VQA-RAD, SLAKE and PathVQA. Moreover, we design a novel framework, MedThink, which finetunes lightweight pretrained generative models by incorporating medical decision-making rationales. MedThink includes three distinct strategies to generate decision outcomes and corresponding rationales, thereby clearly showcasing the medical decision-making process during reasoning. Our comprehensive experiments show that our method achieves an accuracy of 83.5% on R-RAD, 86.3% on R-SLAKE and 87.2% on R-Path. These results significantly exceed those of existing state-of-the-art models with comparable parameters. Datasets and code will be released.