CLApr 13, 2015

Egyptian Dialect Stopword List Generation from Social Network Data

arXiv:1508.02060v18 citations
Originality Synthesis-oriented
AI Analysis

This work addresses a domain-specific need for more accurate natural language processing tools in Arabic dialects, but it is incremental as it adapts existing stopword methods to a new dialect.

The paper tackled the problem of sentiment analysis for Egyptian Dialect in online social networks by generating a stopword list from social media data, resulting in better classification performance compared to using Modern Standard Arabic stopword lists.

This paper proposes a methodology for generating a stopword list from online social network (OSN) corpora in Egyptian Dialect(ED). The aim of the paper is to investigate the effect of removingED stopwords on the Sentiment Analysis (SA) task. The stopwords lists generated before were on Modern Standard Arabic (MSA) which is not the common language used in OSN. We have generated a stopword list of Egyptian dialect to be used with the OSN corpora. We compare the efficiency of text classification when using the generated list along with previously generated lists of MSA and combining the Egyptian dialect list with the MSA list. The text classification was performed using Naïve Bayes and Decision Tree classifiers and two feature selection approaches, unigram and bigram. The experiments show that removing ED stopwords give better performance than using lists of MSA stopwords only.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes