IR LGJan 5, 2021

COVID-19: Comparative Analysis of Methods for Identifying Articles Related to Therapeutics and Vaccines without Using Labeled Data

Mihir Parmar, Ashwin Karthik Ambalavanan, Hong Guan, Rishab Banerjee, Jitesh Pabla, Murthy Devarakonda

arXiv:2101.02017v12.0

Originality Incremental advance

AI Analysis

This work addresses the problem of efficiently screening relevant scientific literature for researchers and policymakers interested in COVID-19 therapeutics and vaccines, offering an incremental improvement in unsupervised classification.

This paper compared six transfer-learning and unsupervised methods for identifying COVID-19 vaccine and therapeutic articles without labeled data. It found that a BERT model, while generally effective, misclassified relevant abstracts lacking task-specific terms, leading to the development of a more effective unsupervised ensemble.

Here we proposed an approach to analyze text classification methods based on the presence or absence of task-specific terms (and their synonyms) in the text. We applied this approach to study six different transfer-learning and unsupervised methods for screening articles relevant to COVID-19 vaccines and therapeutics. The analysis revealed that while a BERT model trained on search-engine results generally performed well, it miss-classified relevant abstracts that did not contain task-specific terms. We used this insight to create a more effective unsupervised ensemble.

View on arXiv PDF

Similar