Improving Moderation of Online Discussions via Interpretable Neural Models
This addresses the challenge of scaling moderation for online platforms, but it is incremental as it builds on existing neural approaches for content moderation.
The paper tackles the problem of automating moderation in online discussions by proposing a neural network method that detects inappropriate comments and highlights offensive parts, evaluated on data from a Slovak news platform.
Growing amount of comments make online discussions difficult to moderate by human moderators only. Antisocial behavior is a common occurrence that often discourages other users from participating in discussion. We propose a neural network based method that partially automates the moderation process. It consists of two steps. First, we detect inappropriate comments for moderators to see. Second, we highlight inappropriate parts within these comments to make the moderation faster. We evaluated our method on data from a major Slovak news discussion platform.