CLJul 22, 2025

BIDWESH: A Bangla Regional Based Hate Speech Detection Dataset

arXiv:2507.16183v16 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

This addresses the gap in hate speech detection for regional Bangla dialects, which is crucial for equitable content moderation in linguistically diverse regions like Bangladesh.

The study tackled the problem of hate speech detection in Bangla dialects by introducing BIDWESH, a multi-dialectal dataset with 9,183 annotated instances, resulting in a linguistically rich resource for improving detection capabilities in low-resource settings.

Hate speech on digital platforms has become a growing concern globally, especially in linguistically diverse countries like Bangladesh, where regional dialects play a major role in everyday communication. Despite progress in hate speech detection for standard Bangla, Existing datasets and systems fail to address the informal and culturally rich expressions found in dialects such as Barishal, Noakhali, and Chittagong. This oversight results in limited detection capability and biased moderation, leaving large sections of harmful content unaccounted for. To address this gap, this study introduces BIDWESH, the first multi-dialectal Bangla hate speech dataset, constructed by translating and annotating 9,183 instances from the BD-SHS corpus into three major regional dialects. Each entry was manually verified and labeled for hate presence, type (slander, gender, religion, call to violence), and target group (individual, male, female, group), ensuring linguistic and contextual accuracy. The resulting dataset provides a linguistically rich, balanced, and inclusive resource for advancing hate speech detection in Bangla. BIDWESH lays the groundwork for the development of dialect-sensitive NLP tools and contributes significantly to equitable and context-aware content moderation in low-resource language settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes