COVID-19: Comparative Analysis of Methods for Identifying Articles Related to Therapeutics and Vaccines without Using Labeled Data
This work addresses the problem of efficiently screening relevant scientific literature for researchers and policymakers interested in COVID-19 therapeutics and vaccines, offering an incremental improvement in unsupervised classification.
This paper compared six transfer-learning and unsupervised methods for identifying COVID-19 vaccine and therapeutic articles without labeled data. It found that a BERT model, while generally effective, misclassified relevant abstracts lacking task-specific terms, leading to the development of a more effective unsupervised ensemble.
Here we proposed an approach to analyze text classification methods based on the presence or absence of task-specific terms (and their synonyms) in the text. We applied this approach to study six different transfer-learning and unsupervised methods for screening articles relevant to COVID-19 vaccines and therapeutics. The analysis revealed that while a BERT model trained on search-engine results generally performed well, it miss-classified relevant abstracts that did not contain task-specific terms. We used this insight to create a more effective unsupervised ensemble.