CVCLMay 18, 2023

Inspecting the Geographical Representativeness of Images from Text-to-Image Models

arXiv:2305.11080v155 citations
Originality Incremental advance
AI Analysis

This addresses the problem of geographical bias in text-to-image models, which can impact generative art, digital marketing, and data augmentation, and is incremental in quantifying and highlighting the need for more inclusive models.

The paper measures the geographical representativeness of images generated by DALL-E 2 and Stable Diffusion, finding that for underspecified inputs, images most reflect the United States and India, with average scores below 3 out of 5 for other countries, and specifying country names increases representativeness by 1.44 points for DALL-E 2 and 0.75 for Stable Diffusion.

Recent progress in generative models has resulted in models that produce both realistic as well as relevant images for most textual inputs. These models are being used to generate millions of images everyday, and hold the potential to drastically impact areas such as generative art, digital marketing and data augmentation. Given their outsized impact, it is important to ensure that the generated content reflects the artifacts and surroundings across the globe, rather than over-representing certain parts of the world. In this paper, we measure the geographical representativeness of common nouns (e.g., a house) generated through DALL.E 2 and Stable Diffusion models using a crowdsourced study comprising 540 participants across 27 countries. For deliberately underspecified inputs without country names, the generated images most reflect the surroundings of the United States followed by India, and the top generations rarely reflect surroundings from all other countries (average score less than 3 out of 5). Specifying the country names in the input increases the representativeness by 1.44 points on average for DALL.E 2 and 0.75 for Stable Diffusion, however, the overall scores for many countries still remain low, highlighting the need for future models to be more geographically inclusive. Lastly, we examine the feasibility of quantifying the geographical representativeness of generated images without conducting user studies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes