CLNov 6, 2020

Hostility Detection Dataset in Hindi

Mohit Bhardwaj, Md Shad Akhtar, Asif Ekbal, Amitava Das, Tanmoy Chakraborty

arXiv:2011.03588v14.981 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This dataset addresses the problem of limited resources for hostility detection in Hindi, which is incremental as it builds on existing work in other languages.

The authors tackled the lack of a comprehensive hostility detection dataset in Hindi by collecting and manually annotating approximately 8200 online posts across four hostility dimensions, resulting in a publicly released dataset for a shared task.

In this paper, we present a novel hostility detection dataset in Hindi language. We collect and manually annotate ~8200 online posts. The annotated dataset covers four hostility dimensions: fake news, hate speech, offensive, and defamation posts, along with a non-hostile label. The hostile posts are also considered for multi-label tags due to a significant overlap among the hostile classes. We release this dataset as part of the CONSTRAINT-2021 shared task on hostile post detection.

View on arXiv PDF Code

Similar