AILGMay 19, 2025

Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models

arXiv:2505.13273v1h-index: 28
Originality Incremental advance
AI Analysis

It addresses fairness and accountability in AI-generated content by identifying biases, though it is incremental as it builds on existing uncertainty estimation methods.

The paper tackles the challenge of estimating uncertainty in text-to-image diffusion models by proposing EMoE, a framework that efficiently estimates epistemic uncertainty without additional training, showing a strong correlation between uncertainty and image quality on the COCO dataset and revealing biases in training data.

Estimating uncertainty in text-to-image diffusion models is challenging because of their large parameter counts (often exceeding 100 million) and operation in complex, high-dimensional spaces with virtually infinite input possibilities. In this paper, we propose Epistemic Mixture of Experts (EMoE), a novel framework for efficiently estimating epistemic uncertainty in diffusion models. EMoE leverages pre-trained networks without requiring additional training, enabling direct uncertainty estimation from a prompt. We leverage a latent space within the diffusion process that captures epistemic uncertainty better than existing methods. Experimental results on the COCO dataset demonstrate EMoE's effectiveness, showing a strong correlation between uncertainty and image quality. Additionally, EMoE identifies under-sampled languages and regions with higher uncertainty, revealing hidden biases in the training set. This capability demonstrates the relevance of EMoE as a tool for addressing fairness and accountability in AI-generated content.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes