NAYEL at SemEval-2020 Task 12: TF/IDF-Based Approach for Automatic Offensive Language Detection in Arabic Tweets
This work addresses the need for automated moderation of offensive content in Arabic social media, but it is incremental as it applies a standard TF/IDF-based method to a specific dataset.
The paper tackled the problem of automatically detecting offensive language in Arabic tweets using a machine learning approach, achieving an F1-score of 81.82% on the test set, which placed it competitively among other systems in the SemEval-2020 task.
In this paper, we present the system submitted to "SemEval-2020 Task 12". The proposed system aims at automatically identify the Offensive Language in Arabic Tweets. A machine learning based approach has been used to design our system. We implemented a linear classifier with Stochastic Gradient Descent (SGD) as optimization algorithm. Our model reported 84.20%, 81.82% f1-score on development set and test set respectively. The best performed system and the system in the last rank reported 90.17% and 44.51% f1-score on test set respectively.