Information Mining for COVID-19 Research From a Large Volume of Scientific Literature
This work aims to expedite COVID-19 research for scientists by providing a structured reference from literature, though it is incremental as it applies existing graph methods to a new dataset.
The authors tackled the problem of extracting key information from a large volume of COVID-19 scientific literature by developing a graph-based model using 10,683 article abstracts to identify important keywords related to transmission, drug types, and genome research, revealing insights into antiviral drugs, pathogen-hosts, and proteins.
The year 2020 has seen an unprecedented COVID-19 pandemic due to the outbreak of a novel strain of coronavirus in 180 countries. In a desperate effort to discover new drugs and vaccines for COVID-19, many scientists are working around the clock. Their valuable time and effort may benefit from computer-based mining of a large volume of health science literature that is a treasure trove of information. In this paper, we have developed a graph-based model using abstracts of 10,683 scientific articles to find key information on three topics: transmission, drug types, and genome research related to coronavirus. A subgraph is built for each of the three topics to extract more topic-focused information. Within each subgraph, we use a betweenness centrality measurement to rank order the importance of keywords related to drugs, diseases, pathogens, hosts of pathogens, and biomolecules. The results reveal intriguing information about antiviral drugs (Chloroquine, Amantadine, Dexamethasone), pathogen-hosts (pigs, bats, macaque, cynomolgus), viral pathogens (zika, dengue, malaria, and several viruses in the coronaviridae virus family), and proteins and therapeutic mechanisms (oligonucleotide, interferon, glycoprotein) in connection with the core topic of coronavirus. The categorical summary of these keywords and topics may be a useful reference to expedite and recommend new and alternative directions for COVID-19 research.