CLOct 7, 2020

Fortifying Toxic Speech Detectors Against Veiled Toxicity

arXiv:2010.03154v11014 citations
Originality Incremental advance
AI Analysis

This addresses the issue of veiled toxicity detection for online platforms and content moderators, offering a solution without requiring large labeled datasets, though it is incremental as it builds on existing detectors.

The paper tackles the problem of modern toxic speech detectors failing to recognize disguised offensive language, such as adversarial attacks and implicit bias, by proposing a framework that uses a handful of probing examples to surface many more disguised offenses and augment training data, resulting in improved robustness to veiled toxicity while maintaining detection of overt toxicity.

Modern toxic speech detectors are incompetent in recognizing disguised offensive language, such as adversarial attacks that deliberately avoid known toxic lexicons, or manifestations of implicit bias. Building a large annotated dataset for such veiled toxicity can be very expensive. In this work, we propose a framework aimed at fortifying existing toxic speech detectors without a large labeled corpus of veiled toxicity. Just a handful of probing examples are used to surface orders of magnitude more disguised offenses. We augment the toxic speech detector's training data with these discovered offensive examples, thereby making it more robust to veiled toxicity while preserving its utility in detecting overt toxicity.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes