CLAIJul 7, 2025

ModelCitizens: Representing Community Voices in Online Safety

arXiv:2507.05455v24 citationsh-index: 16Has CodeEMNLP
Originality Incremental advance
AI Analysis

This addresses the need for more inclusive content moderation by incorporating community-specific perspectives, though it is incremental as it builds on existing model architectures.

The paper tackled the problem of subjective toxicity detection in online content by creating a dataset with diverse community annotations and context-augmented posts, resulting in finetuned models that outperformed GPT-o4-mini by 5.5% on in-distribution evaluations.

Automatic toxic language detection is critical for creating safe, inclusive online spaces. However, it is a highly subjective task, with perceptions of toxic language shaped by community norms and lived experience. Existing toxicity detection models are typically trained on annotations that collapse diverse annotator perspectives into a single ground truth, erasing important context-specific notions of toxicity such as reclaimed language. To address this, we introduce MODELCITIZENS, a dataset of 6.8K social media posts and 40K toxicity annotations across diverse identity groups. To capture the role of conversational context on toxicity, typical of social media posts, we augment MODELCITIZENS posts with LLM-generated conversational scenarios. State-of-the-art toxicity detection tools (e.g. OpenAI Moderation API, GPT-o4-mini) underperform on MODELCITIZENS, with further degradation on context-augmented posts. Finally, we release LLAMACITIZEN-8B and GEMMACITIZEN-12B, LLaMA- and Gemma-based models finetuned on MODELCITIZENS, which outperform GPT-o4-mini by 5.5% on in-distribution evaluations. Our findings highlight the importance of community-informed annotation and modeling for inclusive content moderation. The data, models and code are available at https://github.com/asuvarna31/modelcitizens.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes