LGOct 1, 2022
Ten Years after ImageNet: A 360° Perspective on AISanjay Chawla, Preslav Nakov, Ahmed Ali et al. · berkeley
It is ten years since neural networks made their spectacular comeback. Prompted by this anniversary, we take a holistic perspective on Artificial Intelligence (AI). Supervised Learning for cognitive tasks is effectively solved - provided we have enough high-quality labeled data. However, deep neural network models are not easily interpretable, and thus the debate between blackbox and whitebox modeling has come to the fore. The rise of attention networks, self-supervised learning, generative modeling, and graph neural networks has widened the application space of AI. Deep Learning has also propelled the return of reinforcement learning as a core building block of autonomous decision making systems. The possible harms made possible by new AI technologies have raised socio-technical issues such as transparency, fairness, and accountability. The dominance of AI by Big-Tech who control talent, computing resources, and most importantly, data may lead to an extreme AI divide. Failure to meet high expectations in high profile, and much heralded flagship projects like self-driving vehicles could trigger another AI winter.
CYMar 20
Setting the Course, but Forgetting to Steer: Analyzing Compliance with GDPR's Right of Access to Data by Instagram, TikTok, and YouTubeSai Keerthana Karnam, Abhisek Dash, Antariksh Das et al.
The GDPR's Right of Access aims to empower users with control over their personal data via Data Download Packages (DDPs). However, their effectiveness is often compromised by inconsistent platform implementations, questionable data reliability, and poor user comprehensibility. This paper conducts a comprehensive audit of DDPs from three social media platforms (TikTok, Instagram, and YouTube) to systematically assess these critical drawbacks. Despite offering similar services, we find that these platforms demonstrate significant inconsistencies in implementing the Right of Access, evident in varying levels of shared data. Critically, the failure to disclose processing purposes, retention periods, and other third-party data recipients serves as a further indicator of non-compliance. Our reliability evaluations, using bots and user-donated data, reveal that while TikTok's DDPs offer more consistent and complete data, others exhibit notable shortcomings. Similarly, our assessment of comprehensibility, based on surveys with 400 participants, indicates that current DDPs substantially fall short of GDPR's standards. To improve the comprehensibility, we propose and demonstrate a two-layered approach by: (1)~enhancing the data representation itself using stakeholder interpretations; and (2)~incorporating a user-friendly extension (\textit{Know Your Data}) for intuitive data visualization where users can control the level of transparency they prefer. Our findings underscore the need for clearer and non-conflicting regulatory guidance, stricter enforcement, and platform commitment to realize the goal of GDPR's Right of Access.
CYMay 22
Divergent Paths to Depolarization: Dialogue Design Determines the Prosocial Benefits of AI-Assisted Political ArgumentationJianlong Zhu, Syed Muhammad Jhon Raza Naqvi, Carolin-Theresa Ziemer et al.
Argumentative dialogues across political divides can reduce polarization, yet opportunities for citizens to engage with opposing views in accessible and structured ways remain limited. AI dialogue partners offer a scalable framework for such open-mindedness exercises, but how the format of human-AI dialogues shapes their benefits remains unclear. In a two-session online experiment, 469 US participants were assigned to argue either for or against their own attitude on a contested political issue with an AI chatbot. Our experimental findings show attitude-congruent dialogues produced greater immediate reduction in both affective and opinion polarization than attitude-incongruent dialogues. By contrast, attitude-incongruent dialogues elicited weaker cognitive state empathy than the non-AI reference task but increased cognitive trait empathy in the two-week period between sessions, suggesting the effects of active generation of attitude-incongruent arguments may emerge over time. These findings highlight dialogue design as a key determinant of effective AI-mediated behavioral interventions.
CYNov 9, 2025
Hope, Aspirations, and the Impact of LLMs on Female Programming Learners in AfghanistanHamayoon Behmanush, Freshta Akhtari, Roghieh Nooripour et al.
Designing impactful educational technologies in contexts of socio-political instability requires a nuanced understanding of educational aspirations. Currently, scalable metrics for measuring aspirations are limited. This study adapts, translates, and evaluates Snyder's Hope Scale as a metric for measuring aspirations among 136 women learning programming online during a period of systemic educational restrictions in Afghanistan. The adapted scale demonstrated good reliability (Cronbach's α = 0.78) and participants rated it as understandable and relevant. While overall aspiration-related scores did not differ significantly by access to Large Language Models (LLMs), those with access reported marginally higher scores on the Avenues subscale (p = .056), suggesting broader perceived pathways to achieving educational aspirations. These findings support the use of the adapted scale as a metric for aspirations in contexts of socio-political instability. More broadly, the adapted scale can be used to evaluate the impact of aspiration-driven design of educational technologies.
SIJan 30, 2020Code
Going beyond accuracy: estimating homophily in social networks using predictionsGeorge Berry, Antonio Sirianni, Ingmar Weber et al.
In online social networks, it is common to use predictions of node categories to estimate measures of homophily and other relational properties. However, online social network data often lacks basic demographic information about the nodes. Researchers must rely on predicted node attributes to estimate measures of homophily, but little is known about the validity of these measures. We show that estimating homophily in a network can be viewed as a dyadic prediction problem, and that homophily estimates are unbiased when dyad-level residuals sum to zero in the network. Node-level prediction models, such as the use of names to classify ethnicity or gender, do not generally have this property and can introduce large biases into homophily estimates. Bias occurs due to error autocorrelation along dyads. Importantly, node-level classification performance is not a reliable indicator of estimation accuracy for homophily. We compare estimation strategies that make predictions at the node and dyad levels, evaluating performance in different settings. We propose a novel "ego-alter" modeling approach that outperforms standard node and dyad classification strategies. While this paper focuses on homophily, results generalize to other relational measures which aggregate predictions along the dyads in a network. We conclude with suggestions for research designs to study homophily in online networks. Code for this paper is available at https://github.com/georgeberry/autocorr.
CYMar 23
You See It, They Don't: An Exploratory Study of User-to-User Variation in Instagram CommentsBrahmani Nutakki, Manon Lilott Kempermann, Ingmar Weber
In March 2025, Meta announced a new AI system to rank the order of the comments shown to Instagram users. With existing research showing how feed personalization systems can lead to increased polarization, the introduction of this new system raises similar questions. This paper presents a small-scale exploratory study examining whether the ranking system produces systematic differences in visible comments shown to different users, particularly for news-related content. Using four sock-puppet accounts varying in gender and political leaning, we collect visible comments on posts from ten news and ten non-news accounts. This collection is repeated twice from two VPN locations to assess location effects. We ask 1) how many visible comments vary across different users, 2) is this variation higher for news accounts than non-news accounts, and 3) can user-attributes like gender, political leaning, and location systematically explain the observed variation. Contrary to our expectations, we find that visible comments on news posts are less likely to vary across users than those on non-news posts. Variation is better explained by account metrics like comment and follower counts than by user attributes. These findings provide an initial glimpse into personalized comment ranking on Instagram and motivate larger, more systematic audits of how comment personalization may shape online discourse. To support further research, we provide the code to collect comments and the data upon request.
AIDec 11, 2025
Challenges of Evaluating LLM Safety for User WelfareManon Kempermann, Sai Suresh Macharla Vasu, Mahalakshmi Raveenthiran et al.
Safety evaluations of large language models (LLMs) typically focus on universal risks like dangerous capabilities or undesirable propensities. However, millions use LLMs for personal advice on high-stakes topics like finance and health, where harms are context-dependent rather than universal. While frameworks like the OECD's AI classification recognize the need to assess individual risks, user-welfare safety evaluations remain underdeveloped. We argue that developing such evaluations is non-trivial due to fundamental questions about accounting for user context in evaluation design. In this exploratory study, we evaluated advice on finance and health from GPT-5, Claude Sonnet 4, and Gemini 2.5 Pro across user profiles of varying vulnerability. First, we demonstrate that evaluators must have access to rich user context: identical LLM responses were rated significantly safer by context-blind evaluators than by those aware of user circumstances, with safety scores for high-vulnerability users dropping from safe (5/7) to somewhat unsafe (3/7). One might assume this gap could be addressed by creating realistic user prompts containing key contextual information. However, our second study challenges this: we rerun the evaluation on prompts containing context users report they would disclose, finding no significant improvement. Our work establishes that effective user-welfare safety evaluation requires evaluators to assess responses against diverse user profiles, as realistic user context disclosure alone proves insufficient, particularly for vulnerable populations. By demonstrating a methodology for context-aware evaluation, this study provides both a starting point for such assessments and foundational evidence that evaluating individual welfare demands approaches distinct from existing universal-risk frameworks. We publish our code and dataset to aid future developments.
CVMay 28, 2025
VME: A Satellite Imagery Dataset and Benchmark for Detecting Vehicles in the Middle East and BeyondNoora Al-Emadi, Ingmar Weber, Yin Yang et al.
Detecting vehicles in satellite images is crucial for traffic management, urban planning, and disaster response. However, current models struggle with real-world diversity, particularly across different regions. This challenge is amplified by geographic bias in existing datasets, which often focus on specific areas and overlook regions like the Middle East. To address this gap, we present the Vehicles in the Middle East (VME) dataset, designed explicitly for vehicle detection in high-resolution satellite images from Middle Eastern countries. Sourced from Maxar, the VME dataset spans 54 cities across 12 countries, comprising over 4,000 image tiles and more than 100,000 vehicles, annotated using both manual and semi-automated methods. Additionally, we introduce the largest benchmark dataset for Car Detection in Satellite Imagery (CDSI), combining images from multiple sources to enhance global car detection. Our experiments demonstrate that models trained on existing datasets perform poorly on Middle Eastern images, while the VME dataset significantly improves detection accuracy in this region. Moreover, state-of-the-art models trained on CDSI achieve substantial improvements in global car detection.
CYApr 8
Designing Safe and Accountable GenAI as a Learning Companion with Women Banned from Formal EducationHamayoon Behmanush, Freshta Akhtari, Ingmar Weber et al.
In gender-restrictive and surveilled contexts, where access to formal education may be restricted for women, pursuing education involves safety and privacy risks. When women are excluded from schools and universities, they often turn to online self-learning and generative AI (GenAI) to pursue their educational and career aspirations. However, we know little about what safe and accountable GenAI support is required in the context of surveillance, household responsibilities, and the absence of learning communities. We present a remote participatory design study with 20 women in Afghanistan, informed by a recruitment survey (n = 140), examining how participants envision GenAI for learning and employability. Participants describe using GenAI less as an information source and more as an always-available peer, mentor, and source of career guidance that helps compensate for the absence of learning communities. At the same time, they emphasize that this companionship is constrained by privacy and surveillance risks, contextually unrealistic and culturally unsafe support, and direct-answer interactions that can undermine learning by creating an illusion of progress. Beyond eliciting requirements, envisioning the future with GenAI through participatory design was positively associated with significant increases in participants' aspirations (p=.01), perceived agency (p=.01), and perceived avenues (p=.03). These outcomes show that accountable and safe GenAI is not only about harm reduction but can also actively enable women to imagine and pursue viable learning and employment futures. Building on this, we translate participants' proposals into accountability-focused design directions that center on safety-first interaction and user control, context-grounded support under constrained resources, and offer pedagogically aligned assistance that supports genuine learning rather than quick answers.
HCApr 5
Teacher Professional Development on WhatsApp and LLMs: Early Lessons from CameroonVikram Kamath Cannanure, Bruno Yinkfu, Douglas Bryan et al.
AI in education is commonly delivered through web-based systems such as online forms and institutional platforms. However, these approaches can exclude teachers in low-resource contexts, where everyday mobile platforms like WhatsApp serve as primary digital infrastructure. To address this gap, we present a field pilot in Cameroon that deploys a WhatsApp-based chatbot with LLM-supported content for teacher professional development (TPD), compared with an online form baseline. The system was evaluated through a mixed-methods study with 47 primary school teachers, integrating quantitative measures with qualitative insights from interviews and participant feedback. Results show that the chatbot was rated higher in perceived usability and overall experience, while learnability remained comparable. These improvements were driven by platform familiarity, low interaction overhead, and the modular structure of LLM-supported content, but were constrained by connectivity limitations, prepaid data costs, and multilingual needs (English/French). Building on these findings, we outline design directions for multilingual, culturally grounded interaction and for supporting prompting and reflection in AI use. More broadly, this work points to Thoughtful AI that supports reflection, relevance, and sustained professional growth.
LGAug 15, 2025
A Global Dataset of Location Data Integrity-Assessed Reforestation EffortsAngela John, Selvyn Allotey, Till Koebe et al.
Afforestation and reforestation are popular strategies for mitigating climate change by enhancing carbon sequestration. However, the effectiveness of these efforts is often self-reported by project developers, or certified through processes with limited external validation. This leads to concerns about data reliability and project integrity. In response to increasing scrutiny of voluntary carbon markets, this study presents a dataset on global afforestation and reforestation efforts compiled from primary (meta-)information and augmented with time-series satellite imagery and other secondary data. Our dataset covers 1,289,068 planting sites from 45,628 projects spanning 33 years. Since any remote sensing-based validation effort relies on the integrity of a planting site's geographic boundary, this dataset introduces a standardized assessment of the provided site-level location information, which we summarize in one easy-to-communicate key indicator: LDIS -- the Location Data Integrity Score. We find that approximately 79\% of the georeferenced planting sites monitored fail on at least 1 out of 10 LDIS indicators, while 15\% of the monitored projects lack machine-readable georeferenced data in the first place. In addition to enhancing accountability in the voluntary carbon market, the presented dataset also holds value as training data for e.g. computer vision-related tasks with millions of linked Sentinel-2 and Planetscope satellite images.
CLMay 25, 2025
Misleading through Inconsistency: A Benchmark for Political Inconsistencies DetectionNursulu Sagimbayeva, Ruveyda Betül Bahçeci, Ingmar Weber
Inconsistent political statements represent a form of misinformation. They erode public trust and pose challenges to accountability, when left unnoticed. Detecting inconsistencies automatically could support journalists in asking clarification questions, thereby helping to keep politicians accountable. We propose the Inconsistency detection task and develop a scale of inconsistency types to prompt NLP-research in this direction. To provide a resource for detecting inconsistencies in a political domain, we present a dataset of 698 human-annotated pairs of political statements with explanations of the annotators' reasoning for 237 samples. The statements mainly come from voting assistant platforms such as Wahl-O-Mat in Germany and Smartvote in Switzerland, reflecting real-world political issues. We benchmark Large Language Models (LLMs) on our dataset and show that in general, they are as good as humans at detecting inconsistencies, and might be even better than individual humans at predicting the crowd-annotated ground-truth. However, when it comes to identifying fine-grained inconsistency types, none of the model have reached the upper bound of performance (due to natural labeling variation), thus leaving room for improvement. We make our dataset and code publicly available.
CVMay 7, 2025
A Weak Supervision Learning Approach Towards an Equitable Mobility EstimationTheophilus Aidoo, Till Koebe, Akansh Maurya et al.
The scarcity and high cost of labeled high-resolution imagery have long challenged remote sensing applications, particularly in low-income regions where high-resolution data are scarce. In this study, we propose a weak supervision framework that estimates parking lot occupancy using 3m resolution satellite imagery. By leveraging coarse temporal labels -- based on the assumption that parking lots of major supermarkets and hardware stores in Germany are typically full on Saturdays and empty on Sundays -- we train a pairwise comparison model that achieves an AUC of 0.92 on large parking lots. The proposed approach minimizes the reliance on expensive high-resolution images and holds promise for scalable urban mobility analysis. Moreover, the method can be adapted to assess transit patterns and resource allocation in vulnerable communities, providing a data-driven basis to improve the well-being of those most in need.
CYMay 5, 2025
Coverage Biases in High-Resolution Satellite ImageryVadim Musienko, Axel Jacquet, Ingmar Weber et al.
Satellite imagery is increasingly used to complement traditional data collection approaches such as surveys and censuses across scientific disciplines. However, we ask: Do all places on earth benefit equally from this new wealth of information? In this study, we investigate coverage bias of major satellite constellations that provide optical satellite imagery with a ground sampling distance below 10 meters, evaluating both the future on-demand tasking opportunities as well as the availability of historic images across the globe. Specifically, forward-looking, we estimate how often different places are revisited during a window of 30 days based on the satellites' orbital paths, thus investigating potential coverage biases caused by physical factors. We find that locations farther away from the equator are generally revisited more frequently by the constellations under study. Backward-looking, we show that historic satellite image availability -- based on metadata collected from major satellite imagery providers -- is influenced by socio-economic factors on the ground: less developed, less populated places have less satellite images available. Furthermore, in three small case studies on recent conflict regions in this world, namely Gaza, Sudan and Ukraine, we show that also geopolitical events play an important role in satellite image availability, hinting at underlying business model decisions. These insights lay bare that the digital dividend yielded by satellite imagery is not equally distributed across our planet.
CVJun 6, 2019
How to make a pizza: Learning a compositional layer-based GAN modelDim P. Papadopoulos, Youssef Tamaazousti, Ferda Ofli et al.
A food recipe is an ordered set of instructions for preparing a particular dish. From a visual perspective, every instruction step can be seen as a way to change the visual appearance of the dish by adding extra objects (e.g., adding an ingredient) or changing the appearance of the existing ones (e.g., cooking the dish). In this paper, we aim to teach a machine how to make a pizza by building a generative model that mirrors this step-by-step procedure. To do so, we learn composable module operations which are able to either add or remove a particular ingredient. Each operator is designed as a Generative Adversarial Network (GAN). Given only weak image-level supervision, the operators are trained to generate a visual layer that needs to be added to or removed from the existing image. The proposed model is able to decompose an image into an ordered sequence of layers by applying sequentially in the right order the corresponding removing modules. Experimental results on synthetic and real pizza images demonstrate that our proposed model is able to: (1) segment pizza toppings in a weaklysupervised fashion, (2) remove them by revealing what is occluded underneath them (i.e., inpainting), and (3) infer the ordering of the toppings without any depth ordering supervision. Code, data, and models are available online.
CLMay 29, 2019
Racial Bias in Hate Speech and Abusive Language Detection DatasetsThomas Davidson, Debasmita Bhattacharya, Ingmar Weber
Technologies for abusive language detection are being developed and applied with little consideration of their potential biases. We examine racial bias in five different sets of Twitter data annotated for hate speech and abusive language. We train classifiers on these datasets and compare the predictions of these classifiers on tweets written in African-American English with those written in Standard American English. The results show evidence of systematic racial bias in all datasets, as classifiers trained on them tend to predict that tweets written in African-American English are abusive at substantially higher rates. If these abusive language detection systems are used in the field they will therefore have a disproportionate negative impact on African-American social media users. Consequently, these systems may discriminate against the groups who are often the targets of the abuse we are trying to detect.
CVOct 14, 2018
Recipe1M+: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food ImagesJavier Marin, Aritro Biswas, Ferda Ofli et al.
In this paper, we introduce Recipe1M+, a new large-scale, structured corpus of over one million cooking recipes and 13 million food images. As the largest publicly available collection of recipe data, Recipe1M+ affords the ability to train high-capacity modelson aligned, multimodal data. Using these data, we train a neural network to learn a joint embedding of recipes and images that yields impressive results on an image-recipe retrieval task. Moreover, we demonstrate that regularization via the addition of a high-level classification objective both improves retrieval performance to rival that of humans and enables semantic vector arithmetic. We postulate that these embeddings will provide a basis for further exploration of the Recipe1M+ dataset and food and cooking in general. Code, data and models are publicly available.
CLMay 28, 2017
Understanding Abuse: A Typology of Abusive Language Detection SubtasksZeerak Waseem, Thomas Davidson, Dana Warmsley et al.
As the body of research on abusive language detection and analysis grows, there is a need for critical consideration of the relationships between different subtasks that have been grouped under this label. Based on work on hate speech, cyberbullying, and online abuse we propose a typology that captures central similarities and differences between subtasks and we discuss its implications for data annotation and feature construction. We emphasize the practical actions that can be taken by researchers to best approach their abusive language detection subtask of interest.
CLApr 1, 2017
Psychological and Personality Profiles of Political ExtremistsMeysam Alizadeh, Ingmar Weber, Claudio Cioffi-Revilla et al.
Global recruitment into radical Islamic movements has spurred renewed interest in the appeal of political extremism. Is the appeal a rational response to material conditions or is it the expression of psychological and personality disorders associated with aggressive behavior, intolerance, conspiratorial imagination, and paranoia? Empirical answers using surveys have been limited by lack of access to extremist groups, while field studies have lacked psychological measures and failed to compare extremists with contrast groups. We revisit the debate over the appeal of extremism in the U.S. context by comparing publicly available Twitter messages written by over 355,000 political extremist followers with messages written by non-extremist U.S. users. Analysis of text-based psychological indicators supports the moral foundation theory which identifies emotion as a critical factor in determining political orientation of individuals. Extremist followers also differ from others in four of the Big Five personality traits.
CLMar 11, 2017
Automated Hate Speech Detection and the Problem of Offensive LanguageThomas Davidson, Dana Warmsley, Michael Macy et al.
A key challenge for automatic hate-speech detection on social media is the separation of hate speech from other instances of offensive language. Lexical detection methods tend to have low precision because they classify all messages containing particular terms as hate speech and previous work using supervised learning has failed to distinguish between the two categories. We used a crowd-sourced hate speech lexicon to collect tweets containing hate speech keywords. We use crowd-sourcing to label a sample of these tweets into three categories: those containing hate speech, only offensive language, and those with neither. We train a multi-class classifier to distinguish between these different categories. Close analysis of the predictions and the errors shows when we can reliably separate hate speech from other offensive language and when this differentiation is more difficult. We find that racist and homophobic tweets are more likely to be classified as hate speech but that sexist tweets are generally classified as offensive. Tweets without explicit hate keywords are also more difficult to classify.
HCMar 9, 2017
Face-to-BMI: Using Computer Vision to Infer Body Mass Index on Social MediaEnes Kocabey, Mustafa Camurcu, Ferda Ofli et al.
A person's weight status can have profound implications on their life, ranging from mental health, to longevity, to financial income. At the societal level, "fat shaming" and other forms of "sizeism" are a growing concern, while increasing obesity rates are linked to ever raising healthcare costs. For these reasons, researchers from a variety of backgrounds are interested in studying obesity from all angles. To obtain data, traditionally, a person would have to accurately self-report their body-mass index (BMI) or would have to see a doctor to have it measured. In this paper, we show how computer vision can be used to infer a person's BMI from social media images. We hope that our tool, which we release, helps to advance the study of social aspects related to body weight.
CYFeb 21, 2017
Is Saki #delicious? The Food Perception Gap on Instagram and Its Relation to HealthFerda Ofli, Yusuf Aytar, Ingmar Weber et al.
Food is an integral part of our life and what and how much we eat crucially affects our health. Our food choices largely depend on how we perceive certain characteristics of food, such as whether it is healthy, delicious or if it qualifies as a salad. But these perceptions differ from person to person and one person's "single lettuce leaf" might be another person's "side salad". Studying how food is perceived in relation to what it actually is typically involves a laboratory setup. Here we propose to use recent advances in image recognition to tackle this problem. Concretely, we use data for 1.9 million images from Instagram from the US to look at systematic differences in how a machine would objectively label an image compared to how a human subjectively does. We show that this difference, which we call the "perception gap", relates to a number of health outcomes observed at the county level. To the best of our knowledge, this is the first time that image recognition is being used to study the "misalignment" of how people describe food images vs. what they actually depict.
CYJul 29, 2016
Extracting Food Substitutes From Food Diary via Distributional SimilarityPalakorn Achananuparp, Ingmar Weber
In this paper, we explore the problem of identifying substitute relationship between food pairs from real-world food consumption data as the first step towards the healthier food recommendation. Our method is inspired by the distributional hypothesis in linguistics. Specifically, we assume that foods that are consumed in similar contexts are more likely to be similar dietarily. For example, a turkey sandwich can be considered a suitable substitute for a chicken sandwich if both tend to be consumed with french fries and salad. To evaluate our method, we constructed a real-world food consumption dataset from MyFitnessPal's public food diary entries and obtained ground-truth human judgements of food substitutes from a crowdsourcing service. The experiment results suggest the effectiveness of the method in identifying suitable substitutes.
HCJul 21, 2016
#Sleep_as_Android: Feasibility of Using Sleep Logs on Twitter for Sleep StudiesFatema Akbar, Ingmar Weber
Social media enjoys a growing popularity as a platform to seek and share personal health information. For sleep studies using data from social media, most researchers focused on inferring sleep-related artifacts from self-reported anecdotal pointers to sleep patterns or issues such as insomnia. The data shared by "quantified-selfers" on social media presents an opportunity to study more quantitative and objective measures of sleep. We propose and validate the approach of collecting and analyzing sleep logs that are generated and shared through a sleep-tracking mobile application. We highlight the value of this data by combining it with users' social media data. The results provide a validation of using social media for sleep studies as the collected sleep data is aligned with sleep data from other sources. The results of combining social media data with sleep data provide preliminary evidence that higher social media activity is associated with lower sleep duration and quality.
HCFeb 23, 2016
Crowdsourcing Health Labels: Inferring Body Weight from Profile PicturesIngmar Weber, Yelena Mejova
To use social media for health-related analysis, one key step is the detection of health-related labels for users. But unlike transient conditions like flu, social media users are less vocal about chronic conditions such as obesity, as users might not tweet "I'm still overweight". As, however, obesity-related conditions such as diabetes, heart disease, osteoarthritis, and even cancer are on the rise, this obese-or-not label could be one of the most useful for studies in public health. In this paper we investigate the feasibility of using profile pictures to infer if a user is overweight or not. We show that this is indeed possible and further show that the fraction of labeled-as-overweight users is higher in U.S. counties with higher obesity rates. Going from public to individual health analysis, we then find differences both in behavior and social networks, for example finding users labeled as overweight to have fewer followers.
HCOct 16, 2015
Insights from Machine-Learned Diet Success PredictionIngmar Weber, Palakorn Achananuparp
To support people trying to lose weight and stay healthy, more and more fitness apps have sprung up including the ability to track both calories intake and expenditure. Users of such apps are part of a wider ``quantified self'' movement and many opt-in to publicly share their logged data. In this paper, we use public food diaries of more than 4,000 long-term active MyFitnessPal users to study the characteristics of a (un-)successful diet. Concretely, we train a machine learning model to predict repeatedly being over or under self-set daily calories goals and then look at which features contribute to the model's prediction. Our findings include both expected results, such as the token ``mcdonalds'' or the category ``dessert'' being indicative for being over the calories goal, but also less obvious ones such as the difference between pork and poultry concerning dieting success, or the use of the ``quick added calories'' functionality being indicative of over-shooting calorie-wise. This study also hints at the feasibility of using such data for more in-depth data mining, e.g., looking at the interaction between consumed foods such as mixing protein- and carbohydrate-rich foods. To the best of our knowledge, this is the first systematic study of public food diaries.
HCAug 3, 2015
360 Quantified SelfHamed Haddadi, Ferda Ofli, Yelena Mejova et al.
Wearable devices with a wide range of sensors have contributed to the rise of the Quantified Self movement, where individuals log everything ranging from the number of steps they have taken, to their heart rate, to their sleeping patterns. Sensors do not, however, typically sense the social and ambient environment of the users, such as general life style attributes or information about their social network. This means that the users themselves, and the medical practitioners, privy to the wearable sensor data, only have a narrow view of the individual, limited mainly to certain aspects of their physical condition. In this paper we describe a number of use cases for how social media can be used to complement the check-up data and those from sensors to gain a more holistic view on individuals' health, a perspective we call the 360 Quantified Self. Health-related information can be obtained from sources as diverse as food photo sharing, location check-ins, or profile pictures. Additionally, information from a person's ego network can shed light on the social dimension of wellbeing which is widely acknowledged to be of utmost importance, even though they are currently rarely used for medical diagnosis. We articulate a long-term vision describing the desirable list of technical advances and variety of data to achieve an integrated system encompassing Electronic Health Records (EHR), data from wearable devices, alongside information derived from social media data.
SIJan 25, 2015
Building Bridges into the Unknown: Personalizing Connections to Little-known CountriesYelena Mejova, Javier Borge-Holthoefer, Ingmar Weber
How are you related to Malawi? Do recent events on the Comoros effect you in any subtle way? Who in your extended social network is in Croatia? We seldom ask ourselves these questions, yet a "long tail" of content beyond our everyday knowledge is waiting to be explored. In this work we propose a recommendation task of creating interest in little-known content by building personalized "bridges" to users. We consider an example task of interesting users in little-known countries, and propose a system which aggregates a user's Twitter profile, network, and tweets to create an interest model, which is then matched to a library of knowledge about the countries. We perform a user study of 69 participants and conduct 11 in-depth interviews in order to evaluate the efficacy of the proposed approach and gather qualitative insight into the effect of multi-faceted use of Twitter on the perception of the bridges. We find the increase in interest concerning little-known content to greatly depend on the pre-existing disposition to it. Additionally, we discover a set of vital properties good bridges must possess, including recency, novelty, emotiveness, and a proper selection of language. Using the proposed approach we aim to harvest the "invisible connections" to make explicit the idea of a "small world" where even a faraway country is more closely connected to you than you might have imagined.