CLMay 20, 2025

MoMoE: Mixture of Moderation Experts Framework for AI-Assisted Online Governance

Agam Goyal, Xianyang Zhan, Yilun Chen, Koustuv Saha, Eshwar Chandrasekharan

arXiv:2505.14483v216.315 citationsh-index: 12EMNLP

Originality Incremental advance

AI Analysis

This addresses the need for scalable and transparent moderation tools for online communities, though it is incremental in improving existing methods.

The paper tackles the problem of opaque and non-scalable AI-assisted content moderation by introducing MoMoE, a modular framework that adds post-hoc explanations, achieving Micro-F1 scores of 0.72 and 0.67 on 30 unseen subreddits while matching or surpassing fine-tuned baselines.

Large language models (LLMs) have shown great potential in flagging harmful content in online communities. Yet, existing approaches for moderation require a separate model for every community and are opaque in their decision-making, limiting real-world adoption. We introduce Mixture of Moderation Experts (MoMoE), a modular, cross-community framework that adds post-hoc explanations to scalable content moderation. MoMoE orchestrates four operators -- Allocate, Predict, Aggregate, Explain -- and is instantiated as seven community-specialized experts (MoMoE-Community) and five norm-violation experts (MoMoE-NormVio). On 30 unseen subreddits, the best variants obtain Micro-F1 scores of 0.72 and 0.67, respectively, matching or surpassing strong fine-tuned baselines while consistently producing concise and reliable explanations. Although community-specialized experts deliver the highest peak accuracy, norm-violation experts provide steadier performance across domains. These findings show that MoMoE yields scalable, transparent moderation without needing per-community fine-tuning. More broadly, they suggest that lightweight, explainable expert ensembles can guide future NLP and HCI research on trustworthy human-AI governance of online communities.

View on arXiv PDF

Similar