SICLApr 9, 2020

Large Arabic Twitter Dataset on COVID-19

arXiv:2004.04315v290 citations
AI Analysis

This dataset addresses a gap for researchers and policymakers studying COVID-19's impact in Arabic-speaking regions, though it is incremental as it applies an existing data collection method to a new domain.

The authors tackled the lack of Arabic social media data for COVID-19 research by collecting the first large dataset of Arabic tweets on the pandemic, which includes over 2.5 million confirmed cases and 180,000 fatalities globally at the time of writing, enabling analysis of societal issues and online behaviors.

The 2019 coronavirus disease (COVID-19), emerged late December 2019 in China, is now rapidly spreading across the globe. At the time of writing this paper, the number of global confirmed cases has passed two millions and half with over 180,000 fatalities. Many countries have enforced strict social distancing policies to contain the spread of the virus. This have changed the daily life of tens of millions of people, and urged people to turn their discussions online, e.g., via online social media sites like Twitter. In this work, we describe the first Arabic tweets dataset on COVID-19 that we have been collecting since January 1st, 2020. The dataset would help researchers and policy makers in studying different societal issues related to the pandemic. Many other tasks related to behavioral change, information sharing, misinformation and rumors spreading can also be analyzed.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes