Multiple Document Representations from News Alerts for Automated Bio-surveillance Event Detection
This work aids bio-surveillance analysts in disease tracking by improving automated event detection from online documents, but it is incremental as it builds on existing methods.
The paper tackled automated detection of public health events from news alerts by developing classification methods, achieving 97% recall and 93.3% accuracy with a logistic regression model on combined bidirectional recurrent neural network outputs.
Due to globalization, geographic boundaries no longer serve as effective shields for the spread of infectious diseases. In order to aid bio-surveillance analysts in disease tracking, recent research has been devoted to developing information retrieval and analysis methods utilizing the vast corpora of publicly available documents on the internet. In this work, we present methods for the automated retrieval and classification of documents related to active public health events. We demonstrate classification performance on an auto-generated corpus, using recurrent neural network, TF-IDF, and Naive Bayes log count ratio document representations. By jointly modeling the title and description of a document, we achieve 97% recall and 93.3% accuracy with our best performing bio-surveillance event classification model: logistic regression on the combined output from a pair of bidirectional recurrent neural networks.