CY CLMay 2

Who Decides What Is Harmful? Content Moderation Policy Through A Multi-Agent Personalised Inference Framework

Ewelina Gajewska, Michal Wawer, Katarzyna Budzynska, Jaroslaw A. Chudziak

arXiv:2605.0141668.8h-index: 4

AI Analysis

For online platforms and policymakers, this framework addresses the challenge of subjective harm perception in content moderation by enabling personalized filtering, though it is an incremental application of existing multi-agent LLM techniques.

The paper proposes an LLM-based multi-agent personalized inference framework for content moderation that filters content based on individual user sensitivity profiles, achieving up to a 32% improvement in accuracy over non-personalized baselines.

The increasing scale and complexity of online platforms raises critical policy questions around harmful content, digital well-being, and user autonomy. Traditional content moderation systems rely on centralised, top-down rules, often failing to accommodate the subjective nature of harm perception. This paper proposes an LLM-based multi-agent personalised inference framework that filters content based on unique sensitivity profiles of individual users. Our architecture combines domain-specific Expert Agents, a Manager Agent for orchestrating content analysis and agent selection, and a Ghost Profile Agent for simulating user perspectives, to inform moderation decisions. Evaluated against a range of non-personalised baselines, the system demonstrates up to a 32% improvement in accuracy, showing increased alignment with individual user sensitivities. Beyond technical performance, our framework provides policy-relevant insights for platform governance, providing a scalable way to reconcile moderation policies with societal and individual digital rights

View on arXiv PDF

Similar