COVID-19 Kaggle Literature Organization
This work addresses the challenge for scientists and researchers in efficiently managing and accessing COVID-19-related papers, though it is incremental as it applies existing methods to a new dataset.
The authors tackled the problem of organizing the rapidly growing COVID-19 scientific literature by developing a machine learning approach to group similar papers, resulting in a publicly available proof of concept using the CORD-19 dataset to simplify topic navigation.
The world has faced the devastating outbreak of Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2), or COVID-19, in 2020. Research in the subject matter was fast-tracked to such a point that scientists were struggling to keep up with new findings. With this increase in the scientific literature, there arose a need for organizing those documents. We describe an approach to organize and visualize the scientific literature on or related to COVID-19 using machine learning techniques so that papers on similar topics are grouped together. By doing so, the navigation of topics and related papers is simplified. We implemented this approach using the widely recognized CORD-19 dataset to present a publicly available proof of concept.