Matthew Price

CR
5papers
4citations
Novelty27%
AI Score41

5 Papers

15.4CYApr 23
Taste for Privacy: How Context, Identity, and Lived-Experience Shape Information Sharing Preferences

Juniper Lovato, Laurent Hébert-Dufresne, Mohsen Ghasemizade et al.

Privacy preferences are not fixed individual traits, they depend on context and lived experiences. In this study, we analyze 2,912 survey responses from 782 college students collected over seven survey periods during 2023 and 2024. We ask about their usage of social media, the security settings of their accounts, and measure their comfort in sharing personally identifiable information (PII) across 17 different institutional contexts. Compared to past research, we observe a large shift towards private accounts, going from 1/3rd private in 2007 to 2/3rds in 2024, and find that participants' discomfort sharing PII with social media platforms strongly predicts their privacy settings. Beyond social media, we identify a stable ranking of institutional trust, though some institutions, like the police, show high variability reflecting divergent lived experiences. Traditionally marginalized groups and participants having faced adverse childhood experiences show more discomfort with institutions of power, especially in areas where they face greater vulnerability. We argue for context-adaptive privacy settings that recognize institutional relationships and demographic vulnerabilities, moving beyond one-size-fits-all consent frameworks toward contextually appropriate data governance.

48.0CRMar 28
Context-Aware Phishing Email Detection Using Machine Learning and NLP

Amitabh Chakravorty, Matthew Price, Nelly Elsayed et al.

Phishing attacks remain among the most prevalent cybersecurity threats, causing significant financial losses for individuals and organizations worldwide. This paper presents a machine learning-based phishing email detection system that analyzes email body content using natural language processing (NLP) techniques. Unlike existing approaches that primarily focus on URL analysis, our system classifies emails by extracting contextual features from the entire email content. We evaluated two classification models, Naive Bayes and Logistic Regression, trained on a combined corpus of 53,973 labeled emails from three distinct datasets. Our preprocessing pipeline incorporates lowercasing, tokenization, stop-word removal, and lemmatization, followed by Term Frequency-Inverse Document Frequency (TF-IDF) feature extraction with unigrams and bigrams. Experimental results demonstrate that Logistic Regression achieves 95.41% accuracy with an F1-score of 94.33%, outperforming Naive Bayes by 1.55 percentage points. The system was deployed as a web application with a FastAPI backend, providing real-time phishing classification with average response times of 127ms.

6.2HCMay 14
A Formative Study of Brief Affective Text as a Complement to Wearable Sensing for Longitudinal Student Health Monitoring

Tamunotonye Harry, Johanna Hidalgo, Matthew Price et al.

Wearable devices capture physiological and behavioral data with increasing fidelity, but the psychological context shaping these outcomes is difficult to recover from sensor data alone, limiting passive sensing utility for digital health. We examined whether ultra-brief naturalistic concern text could serve as a scalable complement to passive sensing. In a year-long study of 458 university students (3,610 person-waves) tracked with Oura rings, participants responded bimonthly to an open-ended prompt about what concerned them most; responses had a median length of three words. We compared dictionary-based, general pretrained, and domain-adapted NLP approaches using within-person mixed-effects models across nine sleep and physical activity outcomes. Weeks dominated by academic concern framing were associated with lower physical activity; weeks characterized by emotional exhaustion language were associated with poorer sleep quality and lower heart rate variability. General pretrained embeddings outperformed domain-adapted models for most outcomes, with domain adaptation showing relative advantage for autonomic outcomes. Zero-shot classification of concern topics produced no significant associations, while affective dimensions across all three methods were consistently associated with outcomes, indicating emotional register rather than topical content carries the signal. These findings offer design guidance: ultra-brief affective prompts enrich the psychological interpretability of passive physiological data at minimal burden.

CRJun 30, 2025
Aim High, Stay Private: Differentially Private Synthetic Data Enables Public Release of Behavioral Health Information with High Utility

Mohsen Ghasemizade, Juniper Lovato, Christopher M. Danforth et al.

Sharing health and behavioral data raises significant privacy concerns, as conventional de-identification methods are susceptible to privacy attacks. Differential Privacy (DP) provides formal guarantees against re-identification risks, but practical implementation necessitates balancing privacy protection and the utility of data. We demonstrate the use of DP to protect individuals in a real behavioral health study, while making the data publicly available and retaining high utility for downstream users of the data. We use the Adaptive Iterative Mechanism (AIM) to generate DP synthetic data for Phase 1 of the Lived Experiences Measured Using Rings Study (LEMURS). The LEMURS dataset comprises physiological measurements from wearable devices (Oura rings) and self-reported survey data from first-year college students. We evaluate the synthetic datasets across a range of privacy budgets, epsilon = 1 to 100, focusing on the trade-off between privacy and utility. We evaluate the utility of the synthetic data using a framework informed by actual uses of the LEMURS dataset. Our evaluation identifies the trade-off between privacy and utility across synthetic datasets generated with different privacy budgets. We find that synthetic data sets with epsilon = 5 preserve adequate predictive utility while significantly mitigating privacy risks. Our methodology establishes a reproducible framework for evaluating the practical impacts of epsilon on generating private synthetic datasets with numerous attributes and records, contributing to informed decision-making in data sharing practices.

SOC-PHJun 2, 2021
Quantifying language changes surrounding mental health on Twitter

Anne Marie Stupinski, Thayer Alshaabi, Michael V. Arnold et al.

Mental health challenges are thought to afflict around 10% of the global population each year, with many going untreated due to stigma and limited access to services. Here, we explore trends in words and phrases related to mental health through a collection of 1- , 2-, and 3-grams parsed from a data stream of roughly 10% of all English tweets since 2012. We examine temporal dynamics of mental health language, finding that the popularity of the phrase 'mental health' increased by nearly two orders of magnitude between 2012 and 2018. We observe that mentions of 'mental health' spike annually and reliably due to mental health awareness campaigns, as well as unpredictably in response to mass shootings, celebrities dying by suicide, and popular fictional stories portraying suicide. We find that the level of positivity of messages containing 'mental health', while stable through the growth period, has declined recently. Finally, we use the ratio of original tweets to retweets to quantify the fraction of appearances of mental health language due to social amplification. Since 2015, mentions of mental health have become increasingly due to retweets, suggesting that stigma associated with discussion of mental health on Twitter has diminished with time.