Fairness in Multi-modal Medical Diagnosis with Demonstration Selection
This addresses fairness concerns in medical AI for diverse patient groups, offering a scalable and data-efficient solution, though it is incremental as it builds on existing in-context learning methods.
The paper tackled fairness issues in multimodal large language models for medical image diagnosis by proposing Fairness-Aware Demonstration Selection (FADS), which reduced gender-, race-, and ethnicity-related disparities while maintaining strong accuracy across multiple benchmarks.
Multimodal large language models (MLLMs) have shown strong potential for medical image reasoning, yet fairness across demographic groups remains a major concern. Existing debiasing methods often rely on large labeled datasets or fine-tuning, which are impractical for foundation-scale models. We explore In-Context Learning (ICL) as a lightweight, tuning-free alternative for improving fairness. Through systematic analysis, we find that conventional demonstration selection (DS) strategies fail to ensure fairness due to demographic imbalance in selected exemplars. To address this, we propose Fairness-Aware Demonstration Selection (FADS), which builds demographically balanced and semantically relevant demonstrations via clustering-based sampling. Experiments on multiple medical imaging benchmarks show that FADS consistently reduces gender-, race-, and ethnicity-related disparities while maintaining strong accuracy, offering an efficient and scalable path toward fair medical image reasoning. These results highlight the potential of fairness-aware in-context learning as a scalable and data-efficient solution for equitable medical image reasoning.