CL LGNov 23, 2020

Effect of Word Embedding Models on Hate and Offensive Speech Detection

Safa Alsafari, Samira Sadaoui, Malek Mouhoub

arXiv:2012.07534v10.37 citations

Originality Synthesis-oriented

AI Analysis

This research provides insights into optimal model choices for hate speech detection in Arabic, which is important for researchers and practitioners working on content moderation in this language.

This paper investigates the impact of word embedding models and neural network architectures on hate and offensive speech detection in Arabic. The study found that skip-gram models and CNN networks consistently outperformed other configurations across 2-class, 3-class, and 6-class classification tasks.

Deep neural networks have been adopted successfully in hate speech detection problems. Nevertheless, the effect of the word embedding models on the neural network's performance has not been appropriately examined in the literature. In our study, through different detection tasks, 2-class, 3-class, and 6-class classification, we investigate the impact of both word embedding models and neural network architectures on the predictive accuracy. Our focus is on the Arabic language. We first train several word embedding models on a large-scale unlabelled Arabic text corpus. Next, based on a dataset of Arabic hate and offensive speech, for each detection task, we train several neural network classifiers using the pre-trained word embedding models. This task yields a large number of various learned models, which allows conducting an exhaustive comparison. The empirical analysis demonstrates, on the one hand, the superiority of the skip-gram models and, on the other hand, the superiority of the CNN network across the three detection tasks.

View on arXiv PDF

Similar