CLNov 18, 2021

Automatic Expansion and Retargeting of Arabic Offensive Language Training

Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish, Younes Samih

arXiv:2111.09574v10.51 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of detecting targeted offensive language on social media for Arabic content, which is incremental as it builds on existing offensive language detection methods by focusing on entity-specific phenomena.

The paper tackles the problem of identifying entity-specific offensive language in Arabic tweets by leveraging Twitter reply dynamics and persistent offensive accounts to collect targeted data, resulting in relative F1-measure improvements of 13% and 79% for deep-learning and SVM classifiers, and a 48% improvement when expanding training sets with automatically identified tweets.

Rampant use of offensive language on social media led to recent efforts on automatic identification of such language. Though offensive language has general characteristics, attacks on specific entities may exhibit distinct phenomena such as malicious alterations in the spelling of names. In this paper, we present a method for identifying entity specific offensive language. We employ two key insights, namely that replies on Twitter often imply opposition and some accounts are persistent in their offensiveness towards specific targets. Using our methodology, we are able to collect thousands of targeted offensive tweets. We show the efficacy of the approach on Arabic tweets with 13% and 79% relative F1-measure improvement in entity specific offensive language detection when using deep-learning based and support vector machine based classifiers respectively. Further, expanding the training set with automatically identified offensive tweets directed at multiple entities can improve F1-measure by 48%.

View on arXiv PDF

Similar