Multilingual Hate Speech Detection and Counterspeech Generation: A Comprehensive Survey and Practical Guide
This addresses the problem of combating online hate speech across diverse languages and cultures for researchers, practitioners, and policymakers, but it is incremental as it synthesizes existing work rather than introducing new methods.
This paper tackles the problem of online hate speech in multilingual settings by providing a comprehensive survey and practical guide that analyzes why monolingual systems fail in non-English contexts and outlines a three-phase framework for detection and counterspeech generation. It consolidates progress in multilingual resources while highlighting persistent obstacles like data scarcity in low-resource languages and fairness issues.
Combating online hate speech in multilingual settings requires approaches that go beyond English-centric models and capture the cultural and linguistic diversity of global online discourse. This paper presents a comprehensive survey and practical guide to multilingual hate speech detection and counterspeech generation, integrating recent advances in natural language processing. We analyze why monolingual systems often fail in non-English and code-mixed contexts, missing implicit hate and culturally specific expressions. To address these challenges, we outline a structured three-phase framework - task design, data curation, and evaluation - drawing on state-of-the-art datasets, models, and metrics. The survey consolidates progress in multilingual resources and techniques while highlighting persistent obstacles, including data scarcity in low-resource languages, fairness and bias in system development, and the need for multimodal solutions. By bridging technical progress with ethical and cultural considerations, we provide researchers, practitioners, and policymakers with scalable guidelines for building context-aware, inclusive systems. Our roadmap contributes to advancing online safety through fairer, more effective detection and counterspeech generation across diverse linguistic environments.