CYAILGFeb 1, 2024

Harm Amplification in Text-to-Image Models

arXiv:2402.01787v316 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses safety concerns for users and developers of generative AI by identifying and measuring unintended harms in text-to-image models, though it is incremental as it builds on existing safety research.

The paper tackles the problem of text-to-image models generating harmful images from seemingly safe prompts, a phenomenon termed harm amplification, and develops a framework to quantify it, including empirical analysis of disparate impacts across genders.

Text-to-image (T2I) models have emerged as a significant advancement in generative AI; however, there exist safety concerns regarding their potential to produce harmful image outputs even when users input seemingly safe prompts. This phenomenon, where T2I models generate harmful representations that were not explicit in the input prompt, poses a potentially greater risk than adversarial prompts, leaving users unintentionally exposed to harms. Our paper addresses this issue by formalizing a definition for this phenomenon which we term harm amplification. We further contribute to the field by developing a framework of methodologies to quantify harm amplification in which we consider the harm of the model output in the context of user input. We then empirically examine how to apply these different methodologies to simulate real-world deployment scenarios including a quantification of disparate impacts across genders resulting from harm amplification. Together, our work aims to offer researchers tools to comprehensively address safety challenges in T2I systems and contribute to the responsible deployment of generative AI models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes