CVAug 25, 2025

From Global to Local: Social Bias Transfer in CLIP

arXiv:2508.17750v12 citationsh-index: 62025 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)
Originality Synthesis-oriented
AI Analysis

This work addresses bias propagation in widely used CLIP models for AI applications, but it is incremental as it analyzes existing phenomena without proposing new mitigation methods.

The study investigated how social biases in pre-trained CLIP models transfer to downstream tasks like visual question answering, finding that bias measurement depends on data subsets and transfer trends are inconsistent due to representation convergence during adaptation.

The recycling of contrastive language-image pre-trained (CLIP) models as backbones for a large number of downstream tasks calls for a thorough analysis of their transferability implications, especially their well-documented reproduction of social biases and human stereotypes. How do such biases, learned during pre-training, propagate to downstream applications like visual question answering or image captioning? Do they transfer at all? We investigate this phenomenon, referred to as bias transfer in prior literature, through a comprehensive empirical analysis. Firstly, we examine how pre-training bias varies between global and local views of data, finding that bias measurement is highly dependent on the subset of data on which it is computed. Secondly, we analyze correlations between biases in the pre-trained models and the downstream tasks across varying levels of pre-training bias, finding difficulty in discovering consistent trends in bias transfer. Finally, we explore why this inconsistency occurs, showing that under the current paradigm, representation spaces of different pre-trained CLIPs tend to converge when adapted for downstream tasks. We hope this work offers valuable insights into bias behavior and informs future research to promote better bias mitigation practices.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes