CLSIJan 12, 2024

MetaHate: A Dataset for Unifying Efforts on Hate Speech Detection

arXiv:2401.06526v118 citationsh-index: 13ICWSM
Originality Synthesis-oriented
AI Analysis

This work addresses the need for a unified dataset to improve hate speech detection for researchers and practitioners, though it is incremental as it aggregates existing data rather than introducing new methods.

The study tackled the problem of fragmented hate speech detection datasets by creating MetaHate, a comprehensive meta-collection integrating over 60 existing datasets to unify efforts and enable more robust model training.

Hate speech represents a pervasive and detrimental form of online discourse, often manifested through an array of slurs, from hateful tweets to defamatory posts. As such speech proliferates, it connects people globally and poses significant social, psychological, and occasionally physical threats to targeted individuals and communities. Current computational linguistic approaches for tackling this phenomenon rely on labelled social media datasets for training. For unifying efforts, our study advances in the critical need for a comprehensive meta-collection, advocating for an extensive dataset to help counteract this problem effectively. We scrutinized over 60 datasets, selectively integrating those pertinent into MetaHate. This paper offers a detailed examination of existing collections, highlighting their strengths and limitations. Our findings contribute to a deeper understanding of the existing datasets, paving the way for training more robust and adaptable models. These enhanced models are essential for effectively combating the dynamic and complex nature of hate speech in the digital realm.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes