CLOct 3, 2022

Hypothesis Engineering for Zero-Shot Hate Speech Detection

arXiv:2210.00910v1582 citationsh-index: 25
Originality Incremental advance
AI Analysis

This work addresses the problem of detecting hate speech without labeled data for users needing scalable content moderation, though it is incremental as it builds on existing NLI-based approaches.

The paper tackles zero-shot hate speech detection by proposing a method that combines multiple hypotheses to improve natural language inference (NLI)-based classification, achieving accuracy improvements of 7.9 percentage points on HateCheck and 10.0 percentage points on ETHOS.

Standard approaches to hate speech detection rely on sufficient available hate speech annotations. Extending previous work that repurposes natural language inference (NLI) models for zero-shot text classification, we propose a simple approach that combines multiple hypotheses to improve English NLI-based zero-shot hate speech detection. We first conduct an error analysis for vanilla NLI-based zero-shot hate speech detection and then develop four strategies based on this analysis. The strategies use multiple hypotheses to predict various aspects of an input text and combine these predictions into a final verdict. We find that the zero-shot baseline used for the initial error analysis already outperforms commercial systems and fine-tuned BERT-based hate speech detection models on HateCheck. The combination of the proposed strategies further increases the zero-shot accuracy of 79.4% on HateCheck by 7.9 percentage points (pp), and the accuracy of 69.6% on ETHOS by 10.0pp.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes