CL LGNov 1, 2020

ASAD: A Twitter-based Benchmark Arabic Sentiment Analysis Dataset

Basma Alharbi, Hind Alamro, Manal Alshehri, Zuhair Khayyat, Manal Kalkatawi, Inji Ibrahim Jaber, Xiangliang Zhang

arXiv:2011.00578v31.037 citations

Originality Synthesis-oriented

AI Analysis

This provides a valuable resource for researchers and practitioners in Arabic NLP, though it is incremental as it builds on existing dataset efforts.

The authors tackled the lack of large, high-quality datasets for Arabic sentiment analysis by creating ASAD, a Twitter-based benchmark dataset with 95K tweets annotated into three sentiment classes, and they implemented baseline models to provide reference results for a competition.

This paper provides a detailed description of a new Twitter-based benchmark dataset for Arabic Sentiment Analysis (ASAD), which is launched in a competition3, sponsored by KAUST for awarding 10000 USD, 5000 USD and 2000 USD to the first, second and third place winners, respectively. Compared to other publicly released Arabic datasets, ASAD is a large, high-quality annotated dataset(including 95K tweets), with three-class sentiment labels (positive, negative and neutral). We presents the details of the data collection process and annotation process. In addition, we implement several baseline models for the competition task and report the results as a reference for the participants to the competition.

View on arXiv PDF

Similar