CLNov 29, 2023

DisCGen: A Framework for Discourse-Informed Counterspeech Generation

arXiv:2311.18147v1128 citationsh-index: 17
Originality Incremental advance
AI Analysis

This work addresses the challenge of automated counterspeech generation for social media moderation, though it is incremental as it builds on existing discourse theories and dataset collection methods.

The authors tackled the problem of generating effective and appropriate counterspeech to combat hateful content on social media by proposing a discourse-informed framework, resulting in a manually annotated dataset of 3.9k Reddit comment pairs and showing that large language models can produce contextually-grounded counterspeech with improved safeguards against failures.

Counterspeech can be an effective method for battling hateful content on social media. Automated counterspeech generation can aid in this process. Generated counterspeech, however, can be viable only when grounded in the context of topic, audience and sensitivity as these factors influence both the efficacy and appropriateness. In this work, we propose a novel framework based on theories of discourse to study the inferential links that connect counter speeches to the hateful comment. Within this framework, we propose: i) a taxonomy of counterspeech derived from discourse frameworks, and ii) discourse-informed prompting strategies for generating contextually-grounded counterspeech. To construct and validate this framework, we present a process for collecting an in-the-wild dataset of counterspeech from Reddit. Using this process, we manually annotate a dataset of 3.9k Reddit comment pairs for the presence of hatespeech and counterspeech. The positive pairs are annotated for 10 classes in our proposed taxonomy. We annotate these pairs with paraphrased counterparts to remove offensiveness and first-person references. We show that by using our dataset and framework, large language models can generate contextually-grounded counterspeech informed by theories of discourse. According to our human evaluation, our approaches can act as a safeguard against critical failures of discourse-agnostic models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes