CLAIApr 29, 2020

TUNIZI: a Tunisian Arabizi sentiment analysis Dataset

arXiv:2004.14303v122 citations
Originality Synthesis-oriented
AI Analysis

This addresses the problem of data scarcity for sentiment analysis in Tunisian Arabizi, an informal dialect used on social media, but it is incremental as it primarily provides a new dataset without novel methodological advances.

The paper tackles the lack of annotated data for Tunisian Arabizi sentiment analysis by introducing TUNIZI, a manually annotated dataset collected from social networks, resulting in the first such resource for this low-resource dialect.

On social media, Arabic people tend to express themselves in their own local dialects. More particularly, Tunisians use the informal way called "Tunisian Arabizi". Analytical studies seek to explore and recognize online opinions aiming to exploit them for planning and prediction purposes such as measuring the customer satisfaction and establishing sales and marketing strategies. However, analytical studies based on Deep Learning are data hungry. On the other hand, African languages and dialects are considered low resource languages. For instance, to the best of our knowledge, no annotated Tunisian Arabizi dataset exists. In this paper, we introduce TUNIZI a sentiment analysis Tunisian Arabizi Dataset, collected from social networks, preprocessed for analytical studies and annotated manually by Tunisian native speakers.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes