CYCLApr 14

The Enforcement and Feasibility of Hate Speech Moderation on Twitter

Oxford
arXiv:2604.1228976.01 citationsh-index: 9
AI Analysis

For policymakers and platform regulators, this study demonstrates that Twitter's hate speech moderation failures are not due to technical infeasibility but to resource allocation choices.

An audit of 540,000 tweets across eight languages found that 80% of hateful tweets remained online after five months, with no greater removal likelihood than non-hateful tweets. Simulations show that reducing exposure to hate speech is economically feasible at costs below regulatory penalties, indicating enforcement gaps reflect institutional choices rather than technical limits.

Online hate speech is associated with substantial social harms, yet it remains unclear how consistently platforms enforce hate speech policies or whether enforcement is feasible at scale. We address these questions through a global audit of hate speech moderation on Twitter (now X). Using a complete 24-hour snapshot of public tweets, we construct representative samples comprising 540,000 tweets annotated for hate speech by trained annotators across eight major languages. Five months after posting, 80% of hateful tweets remain online, including explicitly violent hate speech. Such tweets are no more likely to be removed than non-hateful tweets, with neither severity nor visibility increasing the likelihood of removal. We then examine whether these enforcement gaps reflect technical limits of large-scale moderation systems. While fully automated detection systems cannot reliably identify hate speech without generating large numbers of false positives, they effectively prioritize likely violations for human review. Simulations of a human-AI moderation pipeline indicate that substantially reducing user exposure to hate speech is economically feasible at a cost below existing regulatory penalties. These results suggest that the persistence of online hate cannot be explained by technical constraints alone but also reflects institutional choices in the allocation of moderation resources.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes