CY CLApr 14

The Enforcement and Feasibility of Hate Speech Moderation on Twitter

Manuel Tonneau, Dylan Thurgood, Diyi Liu, Niyati Malhotra, Victor Orozco-Olvera, Ralph Schroeder, Scott A. Hale, Manoel Horta Ribeiro, Paul Röttger, Samuel P. Fraiberger

Oxford

arXiv:2604.1228976.01 citationsh-index: 9

AI Analysis

For policymakers and platform regulators, this study demonstrates that Twitter's hate speech moderation failures are not due to technical infeasibility but to resource allocation choices.

An audit of 540,000 tweets across eight languages found that 80% of hateful tweets remained online after five months, with no greater removal likelihood than non-hateful tweets. Simulations show that reducing exposure to hate speech is economically feasible at costs below regulatory penalties, indicating enforcement gaps reflect institutional choices rather than technical limits.

Online hate speech is associated with substantial social harms, yet it remains unclear how consistently platforms enforce hate speech policies or whether enforcement is feasible at scale. We address these questions through a global audit of hate speech moderation on Twitter (now X). Using a complete 24-hour snapshot of public tweets, we construct representative samples comprising 540,000 tweets annotated for hate speech by trained annotators across eight major languages. Five months after posting, 80% of hateful tweets remain online, including explicitly violent hate speech. Such tweets are no more likely to be removed than non-hateful tweets, with neither severity nor visibility increasing the likelihood of removal. We then examine whether these enforcement gaps reflect technical limits of large-scale moderation systems. While fully automated detection systems cannot reliably identify hate speech without generating large numbers of false positives, they effectively prioritize likely violations for human review. Simulations of a human-AI moderation pipeline indicate that substantially reducing user exposure to hate speech is economically feasible at a cost below existing regulatory penalties. These results suggest that the persistence of online hate cannot be explained by technical constraints alone but also reflects institutional choices in the allocation of moderation resources.

View on arXiv PDF

Similar