CLAIOct 22, 2020

Reducing Unintended Identity Bias in Russian Hate Speech Detection

arXiv:2010.11666v1998 citations
Originality Synthesis-oriented
AI Analysis

This addresses bias reduction in hate speech detection for Russian online communities, but it is incremental as it builds on existing methods.

The paper tackles the problem of unintended identity bias in Russian hate speech detection models, where non-toxic words trigger false positives, and proposes simple techniques like generating training data with language models and applying word dropout to reduce this bias, achieving unspecified improvements.

Toxicity has become a grave problem for many online communities and has been growing across many languages, including Russian. Hate speech creates an environment of intimidation, discrimination, and may even incite some real-world violence. Both researchers and social platforms have been focused on developing models to detect toxicity in online communication for a while now. A common problem of these models is the presence of bias towards some words (e.g. woman, black, jew) that are not toxic, but serve as triggers for the classifier due to model caveats. In this paper, we describe our efforts towards classifying hate speech in Russian, and propose simple techniques of reducing unintended bias, such as generating training data with language models using terms and words related to protected identities as context and applying word dropout to such words.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes