CL CYJul 19, 2021

Human-in-the-Loop for Data Collection: a Multi-Target Counter Narrative Dataset to Fight Online Hate Speech

Margherita Fanton, Helena Bonaldi, Serra Sinem Tekiroglu, Marco Guerini

arXiv:2107.08720v132.2724 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the problem of limited datasets for counter narrative generation in NLP, which is crucial for developing healthier online communities, though it is incremental as it builds on existing data collection efforts.

The paper tackles the challenge of creating high-quality and high-quantity datasets for counter narrative generation against online hate speech by proposing a human-in-the-loop methodology using iterative refinement of a generative language model, resulting in a scalable and cost-effective approach that produced the only expert-based multi-target hate speech/counter narrative dataset available.

Undermining the impact of hateful content with informed and non-aggressive responses, called counter narratives, has emerged as a possible solution for having healthier online communities. Thus, some NLP studies have started addressing the task of counter narrative generation. Although such studies have made an effort to build hate speech / counter narrative (HS/CN) datasets for neural generation, they fall short in reaching either high-quality and/or high-quantity. In this paper, we propose a novel human-in-the-loop data collection methodology in which a generative language model is refined iteratively by using its own data from the previous loops to generate new training samples that experts review and/or post-edit. Our experiments comprised several loops including dynamic variations. Results show that the methodology is scalable and facilitates diverse, novel, and cost-effective data collection. To our knowledge, the resulting dataset is the only expert-based multi-target HS/CN dataset available to the community.

View on arXiv PDF Code

Similar