Evaluating Perspectival Biases in Cross-Modal Retrieval
This addresses fairness issues in multimodal AI systems for users across different languages and cultures, but it is incremental as it builds on existing bias mitigation research.
The paper tackled the problem of perspectival biases in cross-modal retrieval systems, specifically prevalence bias and association bias, finding that explicit alignment effectively mitigates prevalence bias but association bias remains more challenging.
Multimodal retrieval systems are expected to operate in a semantic space, agnostic to the language or cultural origin of the query. In practice, however, retrieval outcomes systematically reflect perspectival biases: deviations shaped by linguistic prevalence and cultural associations. We study two such biases. First, prevalence bias refers to the tendency to favor entries from prevalent languages over semantically faithful entries in image-to-text retrieval. Second, association bias refers to the tendency to favor images culturally associated with the query over semantically correct ones in text-to-image retrieval. Results show that explicit alignment is a more effective strategy for mitigating prevalence bias. However, association bias remains a distinct and more challenging problem. These findings suggest that achieving truly equitable multimodal systems requires targeted strategies beyond simple data scaling and that bias arising from cultural association may be treated as a more challenging problem than one arising from linguistic prevalence.