SI IRApr 17, 2020

NAIST COVID: Multilingual COVID-19 Twitter and Weibo Dataset

Zhiwei Gao, Shuntaro Yada, Shoko Wakamiya, Eiji Aramaki

arXiv:2004.08145v110.321 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This dataset addresses the problem of limited social media resources for researchers studying COVID-19 communication, though it is incremental as it builds on existing data collection efforts.

The authors tackled the need for accessible social media data during the COVID-19 pandemic by releasing a multilingual dataset of Twitter and Weibo posts from January to March 2020, providing quantitative and qualitative analyses such as daily word clouds to aid in understanding public communication.

Since the outbreak of coronavirus disease 2019 (COVID-19) in the late 2019, it has affected over 200 countries and billions of people worldwide. This has affected the social life of people owing to enforcements, such as "social distancing" and "stay at home." This has resulted in an increasing interaction through social media. Given that social media can bring us valuable information about COVID-19 at a global scale, it is important to share the data and encourage social media studies against COVID-19 or other infectious diseases. Therefore, we have released a multilingual dataset of social media posts related to COVID-19, consisting of microblogs in English and Japanese from Twitter and those in Chinese from Weibo. The data cover microblogs from January 20, 2020, to March 24, 2020. This paper also provides a quantitative as well as qualitative analysis of these datasets by creating daily word clouds as an example of text-mining analysis. The dataset is now available on Github. This dataset can be analyzed in a multitude of ways and is expected to help in efficient communication of precautions related to COVID-19.

View on arXiv PDF Code

Similar