CLLGSIMLJun 13, 2019

Correlating Twitter Language with Community-Level Health Outcomes

arXiv:1906.06465v21090 citations
AI Analysis

This work addresses public health monitoring by enabling predictions of medical outcomes from social media data, though it is incremental as it builds on existing embedding and regression techniques.

The paper tackles the problem of linking social media language to community-level health outcomes like heart disease and diabetes, using a model that predicts these outcomes from language without labeled data and discovers correlations with lifestyle and socioeconomic factors.

We study how language on social media is linked to diseases such as atherosclerotic heart disease (AHD), diabetes and various types of cancer. Our proposed model leverages state-of-the-art sentence embeddings, followed by a regression model and clustering, without the need of additional labelled data. It allows to predict community-level medical outcomes from language, and thereby potentially translate these to the individual level. The method is applicable to a wide range of target variables and allows us to discover known and potentially novel correlations of medical outcomes with life-style aspects and other socioeconomic risk factors.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes