CLApr 3, 2021

Sexism detection: The first corpus in Algerian dialect with a code-switching in Arabic/ French and English

Imane Guellil, Ahsan Adeel, Faical Azouaou, Mohamed Boubred, Yousra Houichi, Akram Abdelhaq Moumna

arXiv:2104.01443v11.08 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the lack of resources for hate speech detection in Arabic dialects, which is important for moderating online content in underrepresented languages, though it is incremental as it applies existing methods to new data.

The paper tackles the problem of detecting sexist hate speech in the Arabic community on social media by creating the first corpus in Algerian dialect with code-switching between Arabic, French, and English, and achieves an F1-score of 86% using a CNN model on an unbalanced dataset.

In this paper, an approach for hate speech detection against women in Arabic community on social media (e.g. Youtube) is proposed. In the literature, similar works have been presented for other languages such as English. However, to the best of our knowledge, not much work has been conducted in the Arabic language. A new hate speech corpus (Arabic\_fr\_en) is developed using three different annotators. For corpus validation, three different machine learning algorithms are used, including deep Convolutional Neural Network (CNN), long short-term memory (LSTM) network and Bi-directional LSTM (Bi-LSTM) network. Simulation results demonstrate the best performance of the CNN model, which achieved F1-score up to 86\% for the unbalanced corpus as compared to LSTM and Bi-LSTM.

View on arXiv PDF

Similar