SIMLOct 20, 2015

A latent shared-component generative model for real-time disease surveillance using Twitter data

arXiv:1510.05981v14 citations
Originality Incremental advance
AI Analysis

This work addresses timely disease monitoring for public health, but it is incremental as it builds on existing methods for social media data analysis.

The paper tackled the problem of real-time dengue surveillance in small geographical areas by developing a generative model that connects disease case fluctuations with Twitter posts, demonstrating empirically that it predicts next-week disease counts well using data from large Brazilian towns.

Exploiting the large amount of available data for addressing relevant social problems has been one of the key challenges in data mining. Such efforts have been recently named "data science for social good" and attracted the attention of several researchers and institutions. We give a contribution in this objective in this paper considering a difficult public health problem, the timely monitoring of dengue epidemics in small geographical areas. We develop a generative simple yet effective model to connect the fluctuations of disease cases and disease-related Twitter posts. We considered a hidden Markov process driving both, the fluctuations in dengue reported cases and the tweets issued in each region. We add a stable but random source of tweets to represent the posts when no disease cases are recorded. The model is learned through a Markov chain Monte Carlo algorithm that produces the posterior distribution of the relevant parameters. Using data from a significant number of large Brazilian towns, we demonstrate empirically that our model is able to predict well the next weeks of the disease counts using the tweets and disease cases jointly.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes