LGAIJun 9, 2023

Safety and Fairness for Content Moderation in Generative Models

arXiv:2306.06135v132 citationsh-index: 19
Originality Synthesis-oriented
AI Analysis

This work addresses responsible deployment challenges for generative AI technologies, though it is incremental as it builds on existing moderation concepts without introducing new methods.

The authors tackled the problem of content moderation in text-to-image generative models by developing a theoretical framework to define and measure safety and fairness harms, enabling data-driven moderation decisions.

With significant advances in generative AI, new technologies are rapidly being deployed with generative components. Generative models are typically trained on large datasets, resulting in model behaviors that can mimic the worst of the content in the training data. Responsible deployment of generative technologies requires content moderation strategies, such as safety input and output filters. Here, we provide a theoretical framework for conceptualizing responsible content moderation of text-to-image generative technologies, including a demonstration of how to empirically measure the constructs we enumerate. We define and distinguish the concepts of safety, fairness, and metric equity, and enumerate example harms that can come in each domain. We then provide a demonstration of how the defined harms can be quantified. We conclude with a summary of how the style of harms quantification we demonstrate enables data-driven content moderation decisions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes