CVFeb 9, 2025

ClinKD: Cross-Modal Clinical Knowledge Distiller For Multi-Task Medical Images

arXiv:2502.05928v42 citationsh-index: 10Has Code
Originality Highly original
AI Analysis

This work addresses the problem of inadequate image-text alignment and insufficient domain-specified knowledge in medical visual question answering for researchers and practitioners in the medical imaging domain, providing an incremental solution.

The ClinKD framework tackles the limitations of multimodal large language models in handling multi-task medical visual question answering, achieving state-of-the-art performance on several challenging datasets. The approach significantly improves image-text alignment and enables models to adapt to medical knowledge.

Medical Visual Question Answering (Med-VQA) represents a critical and challenging subtask within the general VQA domain. Despite significant advancements in general VQA, multimodal large language models (MLLMs) still exhibit substantial limitations when handling multi-task VQA scenarios. These limitations manifest through erroneous spatial localization and misinterpretation of medical images, which primarily arise from two fundamental issues: inadequate image-text alignment and insufficient domain-specified knowledge for medical applications. To address these issues, we introduce the Cross-Modal Clinical Knowledge Distiller (ClinKD), an innovative framework designed to enhance image-text alignment and establish more effective medical knowledge transformation mechanisms, which enables MLLMs to perform better even when lacking prior medical knowledge. Our extensive experimental evaluations demonstrate that the ClinKD achieves state-of-the-art performance on several datasets which are challenging for Med-VQA task. The results indicate that our approach not only significantly improves image-text alignment but also effectively enables MLLMs to adapt to the medical knowledge. The source code for ClinKD is available at: https://github.com/overloadedHenry/ClinKD.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes