CLCYSISep 8, 2014

Analyzing the Language of Food on Social Media

arXiv:1409.2195v299 citations
AI Analysis

This work addresses the problem of understanding community health and demographics from social media data for researchers and policymakers, though it is incremental in applying existing NLP methods to a new domain.

The study tackled predicting population characteristics like overweight and diabetes rates from food-related social media language, achieving significant performance improvements over baselines and enabling real-time analysis through an online system.

We investigate the predictive power behind the language of food on social media. We collect a corpus of over three million food-related posts from Twitter and demonstrate that many latent population characteristics can be directly predicted from this data: overweight rate, diabetes rate, political leaning, and home geographical location of authors. For all tasks, our language-based models significantly outperform the majority-class baselines. Performance is further improved with more complex natural language processing, such as topic modeling. We analyze which textual features have most predictive power for these datasets, providing insight into the connections between the language of food, geographic locale, and community characteristics. Lastly, we design and implement an online system for real-time query and visualization of the dataset. Visualization tools, such as geo-referenced heatmaps, semantics-preserving wordclouds and temporal histograms, allow us to discover more complex, global patterns mirrored in the language of food.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes