CL AI CY HC LGJun 4, 2025

Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLMs for Countering Hate

Mikel K. Ngueajio, Flor Miriam Plaza-del-Arco, Yi-Ling Chung, Danda B. Rawat, Amanda Cercas Curry

arXiv:2506.04043v12.71 citationsh-index: 14Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of making automated counter-narratives more accessible and ethical for online hate speech mitigation, though it is incremental in nature.

The study evaluated large language models for generating counter-narratives against hate speech, finding that they often produce verbose and college-level responses, with emotionally guided prompts improving empathy but raising safety concerns.

Automated counter-narratives (CN) offer a promising strategy for mitigating online hate speech, yet concerns about their affective tone, accessibility, and ethical risks remain. We propose a framework for evaluating Large Language Model (LLM)-generated CNs across four dimensions: persona framing, verbosity and readability, affective tone, and ethical robustness. Using GPT-4o-Mini, Cohere's CommandR-7B, and Meta's LLaMA 3.1-70B, we assess three prompting strategies on the MT-Conan and HatEval datasets. Our findings reveal that LLM-generated CNs are often verbose and adapted for people with college-level literacy, limiting their accessibility. While emotionally guided prompts yield more empathetic and readable responses, there remain concerns surrounding safety and effectiveness.

View on arXiv PDF Code

Similar