Sentiment Analysis of Arabic Tweets: Feature Engineering and A Hybrid Approach
This work addresses sentiment analysis for Arabic tweets, focusing on the Saudi dialect, but it is incremental as it applies existing hybrid methods to a specific domain.
The paper tackled sentiment analysis for Arabic tweets in the Saudi dialect by developing a hybrid method combining corpus-based and lexicon-based approaches, achieving best F1-scores of 69.9, 61.63, and 55.07 for two-way, three-way, and four-way classification models, respectively.
Sentiment Analysis in Arabic is a challenging task due to the rich morphology of the language. Moreover, the task is further complicated when applied to Twitter data that is known to be highly informal and noisy. In this paper, we develop a hybrid method for sentiment analysis for Arabic tweets for a specific Arabic dialect which is the Saudi Dialect. Several features were engineered and evaluated using a feature backward selection method. Then a hybrid method that combines a corpus-based and lexicon-based method was developed for several classification models (two-way, three-way, four-way). The best F1-score for each of these models was (69.9,61.63,55.07) respectively.