Biomed-DPT: Dual Modality Prompt Tuning for Biomedical Vision-Language Models
This work addresses the challenge of leveraging complex anatomical structures in biomedical images for researchers in medical AI, though it is incremental as it builds on existing prompt learning methods.
The paper tackles the problem of adapting vision-language models to biomedical image classification in few-shot scenarios by proposing Biomed-DPT, a dual modality prompt tuning technique that incorporates clinical knowledge and attention re-weighting, achieving an average accuracy of 66.14% across 11 datasets and outperforming CoOp by up to 8.04%.
Prompt learning is one of the most effective paradigms for adapting pre-trained vision-language models (VLMs) to the biomedical image classification tasks in few shot scenarios. However, most of the current prompt learning methods only used the text prompts and ignored the particular structures (such as the complex anatomical structures and subtle pathological features) in the biomedical images. In this work, we propose Biomed-DPT, a knowledge-enhanced dual modality prompt tuning technique. In designing the text prompt, Biomed-DPT constructs a dual prompt including the template-driven clinical prompts and the large language model (LLM)-driven domain-adapted prompts, then extracts the clinical knowledge from the domain-adapted prompts through the knowledge distillation technique. In designing the vision prompt, Biomed-DPT introduces the zero vector as a soft prompt to leverage attention re-weighting so that the focus on non-diagnostic regions and the recognition of non-critical pathological features are avoided. Biomed-DPT achieves an average classification accuracy of 66.14\% across 11 biomedical image datasets covering 9 modalities and 10 organs, with performance reaching 78.06\% in base classes and 75.97\% in novel classes, surpassing the Context Optimization (CoOp) method by 6.20\%, 3.78\%, and 8.04\%, respectively. Our code are available at \underline{https://github.com/Kanyooo/Biomed-DPT}.