CLAICYHCLGJun 4, 2025

Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLMs for Countering Hate

arXiv:2506.04043v11 citationsh-index: 14
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of making automated counter-narratives more accessible and ethical for online hate speech mitigation, though it is incremental in nature.

The study evaluated large language models for generating counter-narratives against hate speech, finding that they often produce verbose and college-level responses, with emotionally guided prompts improving empathy but raising safety concerns.

Automated counter-narratives (CN) offer a promising strategy for mitigating online hate speech, yet concerns about their affective tone, accessibility, and ethical risks remain. We propose a framework for evaluating Large Language Model (LLM)-generated CNs across four dimensions: persona framing, verbosity and readability, affective tone, and ethical robustness. Using GPT-4o-Mini, Cohere's CommandR-7B, and Meta's LLaMA 3.1-70B, we assess three prompting strategies on the MT-Conan and HatEval datasets. Our findings reveal that LLM-generated CNs are often verbose and adapted for people with college-level literacy, limiting their accessibility. While emotionally guided prompts yield more empathetic and readable responses, there remain concerns surrounding safety and effectiveness.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes