CLMay 11, 2023

A Novel Dataset Towards Extracting Virus-Host Interactions

Rasha Alshawi, Atriya Sen, Nathan S. Upham, Beckett Sterner

arXiv:2305.13317v120.5124 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the need for automated extraction of host-pathogen detection methods from scientific literature, with potential applications in predicting viral spillover risk for human health.

The authors introduced a new manually annotated dataset for named-entity recognition (NER) focused on virus-host interactions, providing initial results using pre-trained models on this dataset.

We describe a novel dataset for the automated recognition of named taxonomic and other entities relevant to the association of viruses with their hosts. We further describe some initial results using pre-trained models on the named-entity recognition (NER) task on this novel dataset. We propose that our dataset of manually annotated abstracts now offers a Gold Standard Corpus for training future NER models in the automated extraction of host-pathogen detection methods from scientific publications, and further explain how our work makes first steps towards predicting the important human health-related concept of viral spillover risk automatically from the scientific literature.

View on arXiv PDF

Similar