CL LGJan 29, 2022

A Deep CNN Architecture with Novel Pooling Layer Applied to Two Sudanese Arabic Sentiment Datasets

Mustafa Mhamed, Richard Sutcliffe, Xia Sun, Jun Feng, Eiad Almekhlafi, Ephrem A. Retta

arXiv:2201.12664v10.3Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the under-researched problem of sentiment analysis for Sudanese Arabic, a dialect with 32 million speakers, by providing new datasets and a model, though it is incremental as it builds on existing CNN methods.

The paper tackles sentiment analysis for Sudanese Arabic by introducing two new datasets and a CNN architecture with a novel pooling layer, achieving accuracies of 92.75% and 84.39% on these datasets and competitive results on existing ones.

Arabic sentiment analysis has become an important research field in recent years. Initially, work focused on Modern Standard Arabic (MSA), which is the most widely-used form. Since then, work has been carried out on several different dialects, including Egyptian, Levantine and Moroccan. Moreover, a number of datasets have been created to support such work. However, up until now, less work has been carried out on Sudanese Arabic, a dialect which has 32 million speakers. In this paper, two new publicly available datasets are introduced, the 2-Class Sudanese Sentiment Dataset (SudSenti2) and the 3-Class Sudanese Sentiment Dataset (SudSenti3). Furthermore, a CNN architecture, SCM, is proposed, comprising five CNN layers together with a novel pooling layer, MMA, to extract the best features. This SCM+MMA model is applied to SudSenti2 and SudSenti3 with accuracies of 92.75% and 84.39%. Next, the model is compared to other deep learning classifiers and shown to be superior on these new datasets. Finally, the proposed model is applied to the existing Saudi Sentiment Dataset and to the MSA Hotel Arabic Review Dataset with accuracies 85.55% and 90.01%.

View on arXiv PDF Code

Similar