CLIRJul 27, 2020

NAYEL at SemEval-2020 Task 12: TF/IDF-Based Approach for Automatic Offensive Language Detection in Arabic Tweets

arXiv:2007.13339v1991 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for automated moderation of offensive content in Arabic social media, but it is incremental as it applies a standard TF/IDF-based method to a specific dataset.

The paper tackled the problem of automatically detecting offensive language in Arabic tweets using a machine learning approach, achieving an F1-score of 81.82% on the test set, which placed it competitively among other systems in the SemEval-2020 task.

In this paper, we present the system submitted to "SemEval-2020 Task 12". The proposed system aims at automatically identify the Offensive Language in Arabic Tweets. A machine learning based approach has been used to design our system. We implemented a linear classifier with Stochastic Gradient Descent (SGD) as optimization algorithm. Our model reported 84.20%, 81.82% f1-score on development set and test set respectively. The best performed system and the system in the last rank reported 90.17% and 44.51% f1-score on test set respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes