CLApr 24, 2019

Toponym Identification in Epidemiology Articles - A Deep Learning Approach

MohammadReza Davari, Leila Kosseim, Tien D. Bui

arXiv:1904.11018v20.37 citations

Originality Incremental advance

AI Analysis

This work addresses the need for fine-grained localization in epidemiology to reduce manual effort in reading articles, though it is incremental as it builds on existing deep learning methods with domain-specific enhancements.

The paper tackles the problem of automating place name identification in epidemiology articles to aid in tracking virus spread, achieving an F1 score of 80.13%, which improves upon the state-of-the-art of 69.84%.

When analyzing the spread of viruses, epidemiologists often need to identify the location of infected hosts. This information can be found in public databases, such as GenBank, however, information provided in these databases are usually limited to the country or state level. More fine-grained localization information requires phylogeographers to manually read relevant scientific articles. In this work we propose an approach to automate the process of place name identification from medical (epidemiology) articles. The focus of this paper is to propose a deep learning based model for toponym detection and experiment with the use of external linguistic features and domain specific information. The model was evaluated using a collection of 105 epidemiology articles from PubMed Central provided by the recent SemEval task 12. Our best detection model achieves an F1 score of $80.13\%$, a significant improvement compared to the state of the art of $69.84\%$. These results underline the importance of domain specific embedding as well as specific linguistic features in toponym detection in medical journals.

View on arXiv PDF

Similar