Text-guided Foundation Model Adaptation for Long-Tailed Medical Image Classification
This work addresses data scarcity and imbalance in medical imaging for rare diseases, representing an incremental improvement through adaptation of existing foundation models.
The paper tackles the problem of low diagnostic accuracy in long-tailed medical image datasets by adapting a text-guided foundation model, achieving up to 27.1% accuracy improvement with only 6.1% GPU memory usage compared to the best existing method.
In medical contexts, the imbalanced data distribution in long-tailed datasets, due to scarce labels for rare diseases, greatly impairs the diagnostic accuracy of deep learning models. Recent multimodal text-image supervised foundation models offer new solutions to data scarcity through effective representation learning. However, their limited medical-specific pretraining hinders their performance in medical image classification relative to natural images. To address this issue, we propose a novel Text-guided Foundation model Adaptation for Long-Tailed medical image classification (TFA-LT). We adopt a two-stage training strategy, integrating representations from the foundation model using just two linear adapters and a single ensembler for balanced outcomes. Experimental results on two long-tailed medical image datasets validate the simplicity, lightweight and efficiency of our approach: requiring only 6.1% GPU memory usage of the current best-performing algorithm, our method achieves an accuracy improvement of up to 27.1%, highlighting the substantial potential of foundation model adaptation in this area.