CVCLJun 10, 2025

CAIRe: Cultural Attribution of Images by Retrieval-Augmented Evaluation

arXiv:2506.09109v2h-index: 20
Originality Incremental advance
AI Analysis

This addresses the challenge of equitable AI performance across cultures for developers and users, though it is incremental as it focuses on evaluation rather than mitigation.

The paper tackles the problem of measuring cultural biases in text-to-image models by introducing CAIRe, an evaluation metric that assesses cultural relevance of images using a knowledge base, achieving a 22% F1 improvement over baselines on a curated dataset and correlations of 0.56-0.66 with human ratings.

As text-to-image models become increasingly prevalent, ensuring their equitable performance across diverse cultural contexts is critical. Efforts to mitigate cross-cultural biases have been hampered by trade-offs, including a loss in performance, factual inaccuracies, or offensive outputs. Despite widespread recognition of these challenges, an inability to reliably measure these biases has stalled progress. To address this gap, we introduce CAIRe, an evaluation metric that assesses the degree of cultural relevance of an image, given a user-defined set of labels. Our framework grounds entities and concepts in the image to a knowledge base and uses factual information to give independent graded judgments for each culture label. On a manually curated dataset of culturally salient but rare items built using language models, CAIRe surpasses all baselines by 22% F1 points. Additionally, we construct two datasets for culturally universal concepts, one comprising T2I-generated outputs and another retrieved from naturally occurring data. CAIRe achieves Pearson's correlations of 0.56 and 0.66 with human ratings on these sets, based on a 5-point Likert scale of cultural relevance. This demonstrates its strong alignment with human judgment across diverse image sources.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes