Mining of health and disease events on Twitter: validating search protocols within the setting of Indonesia
This provides a tool for public health monitoring in Indonesia, but it is incremental as it validates existing methods on new data.
The study validated a search protocol for detecting health and disease events on Twitter in Indonesia, achieving good validity with an AUC beyond 0.8, showing that Twitter monitoring can serve as a real-time proxy for health events.
This study seeks to validate a search protocol of ill health-related terms using Twitter data which can later be used to understand if, and how, Twitter can reveal information on the current health situation. We extracted conversations related to health and disease postings on Twitter using a set of pre-defined keywords, assessed the prevalence, frequency, and timing of such content in these conversations, and validated how this search protocol was able to detect relevant disease tweets. Classification and Regression Trees (CART) algorithm was used to train and test search protocols of disease and health hits comparing to those identified by our team. The accuracy of predictions showed a good validity with AUC beyond 0.8. Our study shows that monitoring of public sentiment on Twitter can be used as a real-time proxy for health events.