CVMar 3

Semi-Supervised Few-Shot Adaptation of Vision-Language Models

arXiv:2603.02959v11.5h-index: 12

Originality Highly original

AI Analysis

The proposed method is significant for medical imaging applications where annotated data is scarce and expensive, providing an incremental solution for improving model performance in low-shot regimes.

The authors tackled the challenge of few-shot adaptation of vision-language models in medical imaging, achieving a reduction in labeling effort by more than 50% in low-shot regimes. This was made possible by leveraging unlabeled data through a semi-supervised solver.

Vision-language models (VLMs) pre-trained on large, heterogeneous data sources are becoming increasingly popular, providing rich multi-modal embeddings that enable efficient transfer to new tasks. A particularly relevant application is few-shot adaptation, where only a handful of annotated examples are available to adapt the model through multi-modal linear probes. In medical imaging, specialized VLMs have shown promising performance in zero- and few-shot image classification, which is valuable for mitigating the high cost of expert annotations. However, challenges remain in extremely low-shot regimes: the inherent class imbalances in medical tasks often lead to underrepresented categories, penalizing overall model performance. To address this limitation, we propose leveraging unlabeled data by introducing an efficient semi-supervised solver that propagates text-informed pseudo-labels during few-shot adaptation. The proposed method enables lower-budget annotation pipelines for adapting VLMs, reducing labeling effort by >50% in low-shot regimes.

View on arXiv PDF

Similar