DL IRApr 30, 2020

Getting Insights from a Large Corpus of Scientific Papers on Specialisted Comprehensive Topics -- the Case of COVID-19

arXiv:2005.00485v12.31 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the need for reliable information extraction from scientific papers to combat fake news, though it is incremental in applying existing NLP techniques to a new dataset.

The paper tackled the challenge of analyzing a large corpus of 24,000 COVID-19 scientific papers by developing two NLP and graph-based methods to extract insights on specific sub-topics like virus origin and drug uses, enabling automatic computer-assisted analysis.

COVID-19 is one of the most important topic these days, specifically on search engines and news. While fake news are easily shared, scientific papers are reliable sources where information can be extracted. With about 24,000 scientific publications on COVID-19 and related research on PUBMED, automatic computer-assisted analysis is required. In this paper, we develop two methodologies to get insights on specific sub-topics of interest and latest research sub-topics. They rely on natural language processing and graph-based visualizations. We run these methodologies on two cases: the virus origin and the uses of existing drugs.

View on arXiv PDF

Similar