CLOct 14, 2025

Beating Harmful Stereotypes Through Facts: RAG-based Counter-speech Generation

Greta Damo, Elena Cabrio, Serena Villata

arXiv:2510.12316v12.7h-index: 37

Originality Incremental advance

AI Analysis

This addresses the need for scalable and trustworthy counter-speech generation for hate speech targeting groups like women and minorities, though it is incremental by building on existing RAG methods.

The paper tackles the problem of generating reliable counter-speech to combat harmful stereotypes by introducing a framework that models it as a knowledge-wise text generation process, integrating Retrieval-Augmented Generation (RAG) pipelines and a knowledge base of 32,792 texts, and results show it outperforms standard LLM baselines and competitive approaches in evaluations.

Counter-speech generation is at the core of many expert activities, such as fact-checking and hate speech, to counter harmful content. Yet, existing work treats counter-speech generation as pure text generation task, mainly based on Large Language Models or NGO experts. These approaches show severe drawbacks due to the limited reliability and coherence in the generated countering text, and in scalability, respectively. To close this gap, we introduce a novel framework to model counter-speech generation as knowledge-wise text generation process. Our framework integrates advanced Retrieval-Augmented Generation (RAG) pipelines to ensure the generation of trustworthy counter-speech for 8 main target groups identified in the hate speech literature, including women, people of colour, persons with disabilities, migrants, Muslims, Jews, LGBT persons, and other. We built a knowledge base over the United Nations Digital Library, EUR-Lex and the EU Agency for Fundamental Rights, comprising a total of 32,792 texts. We use the MultiTarget-CONAN dataset to empirically assess the quality of the generated counter-speech, both through standard metrics (i.e., JudgeLM) and a human evaluation. Results show that our framework outperforms standard LLM baselines and competitive approach, on both assessments. The resulting framework and the knowledge base pave the way for studying trustworthy and sound counter-speech generation, in hate speech and beyond.

View on arXiv PDF

Similar