CLAICYSIAug 13, 2016

Determining Health Utilities through Data Mining of Social Media

arXiv:1608.03938v1
Originality Highly original
AI Analysis

This work addresses the costly and limited scope of traditional health utility assessments for patients and researchers, offering a scalable alternative through social media mining.

The paper tackled the problem of estimating health utilities, which measure patient preferences for health states, by proposing a novel method that uses natural language processing to analyze social media data, achieving successful distinction between mild and severe diseases from a dataset of 2 billion tweets on 60 diseases.

'Health utilities' measure patient preferences for perfect health compared to specific unhealthy states, such as asthma, a fractured hip, or colon cancer. When integrated over time, these estimations are called quality adjusted life years (QALYs). Until now, characterizing health utilities (HUs) required detailed patient interviews or written surveys. While reliable and specific, this data remained costly due to efforts to locate, enlist and coordinate participants. Thus the scope, context and temporality of diseases examined has remained limited. Now that more than a billion people use social media, we propose a novel strategy: use natural language processing to analyze public online conversations for signals of the severity of medical conditions and correlate these to known HUs using machine learning. In this work, we filter a dataset that originally contained 2 billion tweets for relevant content on 60 diseases. Using this data, our algorithm successfully distinguished mild from severe diseases, which had previously been categorized only by traditional techniques. This represents progress towards two related applications: first, predicting HUs where such information is nonexistent; and second, (where rich HU data already exists) estimating temporal or geographic patterns of disease severity through data mining.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes