CLIRSIApr 13, 2020

ArCOV-19: The First Arabic COVID-19 Twitter Dataset with Propagation Networks

arXiv:2004.05861v4822 citations
Originality Synthesis-oriented
AI Analysis

This provides a valuable resource for researchers studying COVID-19 discourse in Arabic, though it is incremental as it extends existing dataset curation efforts to a new language and topic.

The authors tackled the lack of publicly available Arabic Twitter data on COVID-19 by creating ArCOV-19, a dataset with about 2.7 million tweets and propagation networks, enabling research in NLP, information retrieval, and social computing.

In this paper, we present ArCOV-19, an Arabic COVID-19 Twitter dataset that spans one year, covering the period from 27th of January 2020 till 31st of January 2021. ArCOV-19 is the first publicly-available Arabic Twitter dataset covering COVID-19 pandemic that includes about 2.7M tweets alongside the propagation networks of the most-popular subset of them (i.e., most-retweeted and -liked). The propagation networks include both retweets and conversational threads (i.e., threads of replies). ArCOV-19 is designed to enable research under several domains including natural language processing, information retrieval, and social computing. Preliminary analysis shows that ArCOV-19 captures rising discussions associated with the first reported cases of the disease as they appeared in the Arab world. In addition to the source tweets and propagation networks, we also release the search queries and language-independent crawler used to collect the tweets to encourage the curation of similar datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes