CLSep 8, 2021

Unsupervised Text Mining of COVID-19 Records

arXiv:2110.07357v10.2

Originality Synthesis-oriented

AI Analysis

This provides a preprocessed dataset for researchers working on COVID-19 text mining, but it is incremental as it builds on existing data without introducing new methods.

The paper tackled the problem of analyzing COVID-19-related data by preprocessing and annotating the CORD-19 dataset for supervised classification tasks, making it publicly available to aid research on social interventions during the pandemic.

Since the beginning of coronavirus, the disease has spread worldwide and drastically changed many aspects of the human's lifestyle. Twitter as a powerful tool can help researchers measure public health in response to COVID-19. According to the high volume of data production on social networks, automated text mining approaches can help search, read and summarize helpful information. This paper preprocessed the existing medical dataset regarding COVID-19 named CORD-19 and annotated the dataset for supervised classification tasks. At this time of the COVID-19 pandemic, we made a preprocessed dataset for the research community. This may contribute towards finding new solutions for some social interventions that COVID-19 has made. The preprocessed version of the mentioned dataset is publicly available through Github.

View on arXiv PDF

Similar