CVAIApr 22, 2025

Evaluating Vision Language Models (VLMs) for Radiology: A Comprehensive Analysis

arXiv:2504.16047v11 citationsh-index: 24
Originality Synthesis-oriented
AI Analysis

This work provides guidance for selecting foundation models in radiology, addressing a domain-specific problem for medical AI practitioners, but it is incremental as it compares existing methods on new data.

This study evaluated three vision-language foundation models (RAD-DINO, CheXagent, and BiomedCLIP) on radiology tasks like classification, segmentation, and regression for pneumothorax and cardiomegaly, finding that RAD-DINO excelled in segmentation, CheXagent in classification, and a custom segmentation model improved performance for all models.

Foundation models, trained on vast amounts of data using self-supervised techniques, have emerged as a promising frontier for advancing artificial intelligence (AI) applications in medicine. This study evaluates three different vision-language foundation models (RAD-DINO, CheXagent, and BiomedCLIP) on their ability to capture fine-grained imaging features for radiology tasks. The models were assessed across classification, segmentation, and regression tasks for pneumothorax and cardiomegaly on chest radiographs. Self-supervised RAD-DINO consistently excelled in segmentation tasks, while text-supervised CheXagent demonstrated superior classification performance. BiomedCLIP showed inconsistent performance across tasks. A custom segmentation model that integrates global and local features substantially improved performance for all foundation models, particularly for challenging pneumothorax segmentation. The findings highlight that pre-training methodology significantly influences model performance on specific downstream tasks. For fine-grained segmentation tasks, models trained without text supervision performed better, while text-supervised models offered advantages in classification and interpretability. These insights provide guidance for selecting foundation models based on specific clinical applications in radiology.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes