CVJun 17, 2025

Interpreting Biomedical VLMs on High-Imbalance Out-of-Distributions: An Insight into BiomedCLIP on Radiology

Nafiz Sadman, Farhana Zulkernine, Benjamin Kwan

arXiv:2506.14136v16.21 citationsh-index: 22Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses reliability issues for applying vision-language models in real-world medical imaging, though it is incremental as it focuses on evaluating an existing model.

The paper investigates BiomedCLIP's performance on a highly imbalanced, out-of-distribution medical dataset (IU-xray), finding that zero-shot inference leads to poor precision and class separability, while full fine-tuning improves disease classification.

In this paper, we construct two research objectives: i) explore the learned embedding space of BiomedCLIP, an open-source large vision language model, to analyse meaningful class separations, and ii) quantify the limitations of BiomedCLIP when applied to a highly imbalanced, out-of-distribution multi-label medical dataset. We experiment on IU-xray dataset, which exhibits the aforementioned criteria, and evaluate BiomedCLIP in classifying images (radiographs) in three contexts: zero-shot inference, full finetuning, and linear probing. The results show that the model under zero-shot settings over-predicts all labels, leading to poor precision and inter-class separability. Full fine-tuning improves classification of distinct diseases, while linear probing detects overlapping features. We demonstrate visual understanding of the model using Grad-CAM heatmaps and compare with 15 annotations by a radiologist. We highlight the need for careful adaptations of the models to foster reliability and applicability in a real-world setting. The code for the experiments in this work is available and maintained on GitHub.

View on arXiv PDF Code

Similar