CLNov 6, 2020

Hostility Detection Dataset in Hindi

arXiv:2011.03588v181 citations
AI Analysis

This dataset addresses the problem of limited resources for hostility detection in Hindi, which is incremental as it builds on existing work in other languages.

The authors tackled the lack of a comprehensive hostility detection dataset in Hindi by collecting and manually annotating approximately 8200 online posts across four hostility dimensions, resulting in a publicly released dataset for a shared task.

In this paper, we present a novel hostility detection dataset in Hindi language. We collect and manually annotate ~8200 online posts. The annotated dataset covers four hostility dimensions: fake news, hate speech, offensive, and defamation posts, along with a non-hostile label. The hostile posts are also considered for multi-label tags due to a significant overlap among the hostile classes. We release this dataset as part of the CONSTRAINT-2021 shared task on hostile post detection.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes