CVSep 3, 2024

Can language-guided unsupervised adaptation improve medical image classification using unpaired images and texts?

Umaima Rahman, Raza Imam, Mohammad Yaqub, Boulbaba Ben Amor, Dwarikanath Mahapatra

arXiv:2409.02729v25.22 citationsh-index: 25Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of limited labeled data for medical image classification in healthcare, offering an incremental improvement by adapting existing VLMs with unsupervised techniques.

The paper tackles the challenge of scarce labeled medical images by proposing MedUnA, an unsupervised adaptation method for Vision-Language Models that uses unpaired images and text to improve medical image classification, achieving significant accuracy gains over zero-shot baselines on chest X-ray, diabetic retinopathy, and skin lesion datasets.

In medical image classification, supervised learning is challenging due to the scarcity of labeled medical images. To address this, we leverage the visual-textual alignment within Vision-Language Models (VLMs) to enable unsupervised learning of a medical image classifier. In this work, we propose \underline{Med}ical \underline{Un}supervised \underline{A}daptation (\texttt{MedUnA}) of VLMs, where the LLM-generated descriptions for each class are encoded into text embeddings and matched with class labels via a cross-modal adapter. This adapter attaches to a visual encoder of \texttt{MedCLIP} and aligns the visual embeddings through unsupervised learning, driven by a contrastive entropy-based loss and prompt tuning. Thereby, improving performance in scenarios where textual information is more abundant than labeled images, particularly in the healthcare domain. Unlike traditional VLMs, \texttt{MedUnA} uses \textbf{unpaired images and text} for learning representations and enhances the potential of VLMs beyond traditional constraints. We evaluate the performance on three chest X-ray datasets and two multi-class datasets (diabetic retinopathy and skin lesions), showing significant accuracy gains over the zero-shot baseline. Our code is available at https://github.com/rumaima/meduna.

View on arXiv PDF Code

Similar