CLSep 5, 2023

Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation via Attention Regularization

arXiv:2309.02311v1194 citationsh-index: 26Has Code
Originality Highly original
AI Analysis

This work addresses the challenge of generating effective counter narratives for online hate speech, which is incremental as it builds on existing methods to enhance portability and diversity.

The paper tackled the problem of overfitting in pretrained language models for hate speech counter narrative generation, introducing attention regularization to improve generalization, resulting in better performance than state-of-the-art approaches in most cases, especially for unseen hateful targets.

Recent computational approaches for combating online hate speech involve the automatic generation of counter narratives by adapting Pretrained Transformer-based Language Models (PLMs) with human-curated data. This process, however, can produce in-domain overfitting, resulting in models generating acceptable narratives only for hatred similar to training data, with little portability to other targets or to real-world toxic language. This paper introduces novel attention regularization methodologies to improve the generalization capabilities of PLMs for counter narratives generation. Overfitting to training-specific terms is then discouraged, resulting in more diverse and richer narratives. We experiment with two attention-based regularization techniques on a benchmark English dataset. Regularized models produce better counter narratives than state-of-the-art approaches in most cases, both in terms of automatic metrics and human evaluation, especially when hateful targets are not present in the training data. This work paves the way for better and more flexible counter-speech generation models, a task for which datasets are highly challenging to produce.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes