AIDec 2, 2025Code
Menta: A Small Language Model for On-Device Mental Health PredictionTianyi Zhang, Xiangyuan Xue, Lingyan Ruan et al.
Mental health conditions affect hundreds of millions globally, yet early detection remains limited. While large language models (LLMs) have shown promise in mental health applications, their size and computational demands hinder practical deployment. Small language models (SLMs) offer a lightweight alternative, but their use for social media--based mental health prediction remains largely underexplored. In this study, we introduce Menta, the first optimized SLM fine-tuned specifically for multi-task mental health prediction from social media data. Menta is jointly trained across six classification tasks using a LoRA-based framework, a cross-dataset strategy, and a balanced accuracy--oriented loss. Evaluated against nine state-of-the-art SLM baselines, Menta achieves an average improvement of 15.2\% across tasks covering depression, stress, and suicidality compared with the best-performing non--fine-tuned SLMs. It also achieves higher accuracy on depression and stress classification tasks compared to 13B-parameter LLMs, while being approximately 3.25x smaller. Moreover, we demonstrate real-time, on-device deployment of Menta on an iPhone 15 Pro Max, requiring only approximately 3GB RAM. Supported by a comprehensive benchmark against existing SLMs and LLMs, Menta highlights the potential for scalable, privacy-preserving mental health monitoring. Code is available at: https://xxue752-nz.github.io/menta-project/
HCJul 5, 2024
Enabling On-Device LLMs Personalization with Smartphone SensingShiquan Zhang, Ying Ma, Le Fang et al.
This demo presents a novel end-to-end framework that combines on-device large language models (LLMs) with smartphone sensing technologies to achieve context-aware and personalized services. The framework addresses critical limitations of current personalization solutions via cloud LLMs, such as privacy concerns, latency and cost, and limited personal information. To achieve this, we innovatively proposed deploying LLMs on smartphones with multimodal sensor data through context-aware sensing and customized prompt engineering, ensuring privacy and enhancing personalization performance. A case study involving a university student demonstrated the capability of the framework to provide tailored recommendations. In addition, we show that the framework achieves the best trade-off in privacy, performance, latency, cost, battery and energy consumption between on-device and cloud LLMs. To the best of our knowledge, this is the first framework to provide on-device LLMs personalization with smartphone sensing. Future work will incorporate more diverse sensor data and involve extensive user studies to enhance personalization. Our proposed framework has the potential to substantially improve user experiences across domains including healthcare, productivity, and entertainment.
HCApr 20
Do LLMs Need to See Everything? A Benchmark and Study of Failures in LLM-driven Smartphone Automation using Screentext vs. ScreenshotsShiquan Zhang, Tianyi Zhang, Le Fang et al.
With the rapid advancement of large language models (LLMs), mobile agents have emerged as promising tools for phone automation, simulating human interactions on screens to accomplish complex tasks. However, these agents often suffer from low accuracy, misinterpretation of user instructions, and failure on challenging tasks, with limited prior work examining why and where they fail. To address this, we introduce DailyDroid, a benchmark of 75 tasks in five scenarios across 25 Android apps, spanning three difficulty levels to mimic everyday smartphone use. We evaluate it using text-only and multimodal (text + screenshot) inputs on GPT-4o and o4-mini across 300 trials, revealing comparable performance with multimodal inputs yielding marginally higher success rates. Through in-depth failure analysis, we compile a handbook of common failures. Our findings reveal critical issues in UI accessibility, input modalities, and LLM/app design, offering implications for future mobile agents, applications, and UI development.
HCAug 23, 2024
Predicting Affective States from Screen Text SentimentSongyan Teng, Tianyi Zhang, Simon D'Alfonso et al.
The proliferation of mobile sensing technologies has enabled the study of various physiological and behavioural phenomena through unobtrusive data collection from smartphone sensors. This approach offers real-time insights into individuals' physical and mental states, creating opportunities for personalised treatment and interventions. However, the potential of analysing the textual content viewed on smartphones to predict affective states remains underexplored. To better understand how the screen text that users are exposed to and interact with can influence their affects, we investigated a subset of data obtained from a digital phenotyping study of Australian university students conducted in 2023. We employed linear regression, zero-shot, and multi-shot prompting using a large language model (LLM) to analyse relationships between screen text and affective states. Our findings indicate that multi-shot prompting substantially outperforms both linear regression and zero-shot prompting, highlighting the importance of context in affect prediction. We discuss the value of incorporating textual and sentiment data for improving affect prediction, providing a basis for future advancements in understanding smartphone use and wellbeing.
HCJul 11, 2024
Leveraging LLMs to Predict Affective States via Smartphone Sensor FeaturesTianyi Zhang, Songyan Teng, Hong Jia et al.
As mental health issues for young adults present a pressing public health concern, daily digital mood monitoring for early detection has become an important prospect. An active research area, digital phenotyping, involves collecting and analysing data from personal digital devices such as smartphones (usage and sensors) and wearables to infer behaviours and mental health. Whilst this data is standardly analysed using statistical and machine learning approaches, the emergence of large language models (LLMs) offers a new approach to make sense of smartphone sensing data. Despite their effectiveness across various domains, LLMs remain relatively unexplored in digital mental health, particularly in integrating mobile sensor data. Our study aims to bridge this gap by employing LLMs to predict affect outcomes based on smartphone sensing data from university students. We demonstrate the efficacy of zero-shot and few-shot embedding LLMs in inferring general wellbeing. Our findings reveal that LLMs can make promising predictions of affect measures using solely smartphone sensing data. This research sheds light on the potential of LLMs for affective state prediction, emphasizing the intricate link between smartphone behavioral patterns and affective states. To our knowledge, this is the first work to leverage LLMs for affective state prediction and digital phenotyping tasks.
HCNov 7, 2024
AWARE Narrator and the Utilization of Large Language Models to Extract Behavioral Insights from Smartphone Sensing DataTianyi Zhang, Miu Kojima, Simon D'Alfonso
Smartphones, equipped with an array of sensors, have become valuable tools for personal sensing. Particularly in digital health, smartphones facilitate the tracking of health-related behaviors and contexts, contributing significantly to digital phenotyping, a process where data from digital interactions is analyzed to infer behaviors and assess mental health. Traditional methods process raw sensor data into information features for statistical and machine learning analyses. In this paper, we introduce a novel approach that systematically converts smartphone-collected data into structured, chronological narratives. The AWARE Narrator translates quantitative smartphone sensing data into English language descriptions, forming comprehensive narratives of an individual's activities. We apply the framework to the data collected from university students over a week, demonstrating the potential of utilizing the narratives to summarize individual behavior, and analyzing psychological states by leveraging large language models.