Are Chess Discussions Racist? An Adversarial Hate Speech Data Set
This addresses the challenge of adversarial examples in hate-speech detection for social media platforms, though it is incremental as it focuses on a specific domain.
The paper tackles the problem of off-the-shelf hate-speech classifiers misclassifying benign chess discussions as hate speech, using a dataset of 681,995 YouTube comments from chess channels, and finds that existing classifiers misclassify 1,000 annotated comments, highlighting issues with color polysemy.
On June 28, 2020, while presenting a chess podcast on Grandmaster Hikaru Nakamura, Antonio Radić's YouTube handle got blocked because it contained "harmful and dangerous" content. YouTube did not give further specific reason, and the channel got reinstated within 24 hours. However, Radić speculated that given the current political situation, a referral to "black against white", albeit in the context of chess, earned him this temporary ban. In this paper, via a substantial corpus of 681,995 comments, on 8,818 YouTube videos hosted by five highly popular chess-focused YouTube channels, we ask the following research question: \emph{how robust are off-the-shelf hate-speech classifiers to out-of-domain adversarial examples?} We release a data set of 1,000 annotated comments where existing hate speech classifiers misclassified benign chess discussions as hate speech. We conclude with an intriguing analogy result on racial bias with our findings pointing out to the broader challenge of color polysemy.