SI CLApr 9, 2020

Large Arabic Twitter Dataset on COVID-19

Sarah Alqurashi, Ahmad Alhindi, Eisa Alanazi

arXiv:2004.04315v215.590 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This dataset addresses a gap for researchers and policymakers studying COVID-19's impact in Arabic-speaking regions, though it is incremental as it applies an existing data collection method to a new domain.

The authors tackled the lack of Arabic social media data for COVID-19 research by collecting the first large dataset of Arabic tweets on the pandemic, which includes over 2.5 million confirmed cases and 180,000 fatalities globally at the time of writing, enabling analysis of societal issues and online behaviors.

The 2019 coronavirus disease (COVID-19), emerged late December 2019 in China, is now rapidly spreading across the globe. At the time of writing this paper, the number of global confirmed cases has passed two millions and half with over 180,000 fatalities. Many countries have enforced strict social distancing policies to contain the spread of the virus. This have changed the daily life of tens of millions of people, and urged people to turn their discussions online, e.g., via online social media sites like Twitter. In this work, we describe the first Arabic tweets dataset on COVID-19 that we have been collecting since January 1st, 2020. The dataset would help researchers and policy makers in studying different societal issues related to the pandemic. Many other tasks related to behavioral change, information sharing, misinformation and rumors spreading can also be analyzed.

View on arXiv PDF Code

Similar