CVLGOct 26, 2025

FairJudge: MLLM Judging for Social Attributes and Prompt Image Alignment

arXiv:2510.22827v23 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses the need for reliable and reproducible fairness audits in text-to-image systems, though it is incremental as it builds on existing evaluation methods.

The paper tackles the problem of evaluating text-to-image systems for prompt alignment and social attribute treatment by introducing FairJudge, a lightweight protocol using multimodal LLMs as judges, which outperforms existing baselines on demographic prediction and improves mean alignment while maintaining high profession accuracy.

Text-to-image (T2I) systems lack simple, reproducible ways to evaluate how well images match prompts and how models treat social attributes. Common proxies -- face classifiers and contrastive similarity -- reward surface cues, lack calibrated abstention, and miss attributes only weakly visible (for example, religion, culture, disability). We present FairJudge, a lightweight protocol that treats instruction-following multimodal LLMs as fair judges. It scores alignment with an explanation-oriented rubric mapped to [-1, 1]; constrains judgments to a closed label set; requires evidence grounded in the visible content; and mandates abstention when cues are insufficient. Unlike CLIP-only pipelines, FairJudge yields accountable, evidence-aware decisions; unlike mitigation that alters generators, it targets evaluation fairness. We evaluate gender, race, and age on FairFace, PaTA, and FairCoT; extend to religion, culture, and disability; and assess profession correctness and alignment on IdenProf, FairCoT-Professions, and our new DIVERSIFY-Professions. We also release DIVERSIFY, a 469-image corpus of diverse, non-iconic scenes. Across datasets, judge models outperform contrastive and face-centric baselines on demographic prediction and improve mean alignment while maintaining high profession accuracy, enabling more reliable, reproducible fairness audits.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes