CV AIFeb 2, 2024

Cross-modality debiasing: using language to mitigate sub-population shifts in imaging

arXiv:2403.07888v23.72 citationsh-index: 2

Originality Incremental advance

AI Analysis

This addresses algorithmic bias in machine learning for imaging applications, offering a method to enhance distributional robustness, though it is incremental as it builds on existing multi-modality models like CLIP.

The paper tackles sub-population shifts in imaging data, which cause algorithmic bias, by using natural language inputs to debias image feature representations in the CLIP vision-language model, resulting in significant performance improvement and reduced instability on worst-case sub-populations.

Sub-population shift is a specific type of domain shift that highlights changes in data distribution within specific sub-groups or populations between training and testing. Sub-population shift accounts for a significant source of algorithmic bias and calls for distributional robustness. Recent studies found inherent distributional robustness in multi-modality foundation models, such as the vision-language model CLIP, yet this robustness is vulnerable through parameter fine-tuning. In this paper, we propose leveraging the connection of robustness among different modalities and reshaping the distributional robustness of one modality with another. Specifically, in the context of the distributional robustness of CLIP, we propose to leverage natural language inputs to debias the image feature representations, to improve worst-case performance on sub-populations. Our extensive empirical studies show that image representations debiased by natural language can achieve significant performance improvement and reduction of performance instability under sub-population shifts.

View on arXiv PDF

Similar