CLMar 29, 2024
Fine-tuning Large Language Models for Automated Diagnostic Screening SummariesManjeet Yadav, Nilesh Kumar Sahu, Mudita Chaturvedi et al.
Improving mental health support in developing countries is a pressing need. One potential solution is the development of scalable, automated systems to conduct diagnostic screenings, which could help alleviate the burden on mental health professionals. In this work, we evaluate several state-of-the-art Large Language Models (LLMs), with and without fine-tuning, on our custom dataset for generating concise summaries from mental state examinations. We rigorously evaluate four different models for summary generation using established ROUGE metrics and input from human evaluators. The results highlight that our top-performing fine-tuned model outperforms existing models, achieving ROUGE-1 and ROUGE-L values of 0.810 and 0.764, respectively. Furthermore, we assessed the fine-tuned model's generalizability on a publicly available D4 dataset, and the outcomes were promising, indicating its potential applicability beyond our custom dataset.
CVFeb 22, 2025
AnxietyFaceTrack: A Smartphone-Based Non-Intrusive Approach for Detecting Social Anxiety Using Facial FeaturesNilesh Kumar Sahu, Snehil Gupta, Haroon R Lone
Social Anxiety Disorder (SAD) is a widespread mental health condition, yet its lack of objective markers hinders timely detection and intervention. While previous research has focused on behavioral and non-verbal markers of SAD in structured activities (e.g., speeches or interviews), these settings fail to replicate real-world, unstructured social interactions fully. Identifying non-verbal markers in naturalistic, unstaged environments is essential for developing ubiquitous and non-intrusive monitoring solutions. To address this gap, we present AnxietyFaceTrack, a study leveraging facial video analysis to detect anxiety in unstaged social settings. A cohort of 91 participants engaged in a social setting with unfamiliar individuals and their facial videos were recorded using a low-cost smartphone camera. We examined facial features, including eye movements, head position, facial landmarks, and facial action units, and used self-reported survey data to establish ground truth for multiclass (anxious, neutral, non-anxious) and binary (e.g., anxious vs. neutral) classifications. Our results demonstrate that a Random Forest classifier trained on the top 20% of features achieved the highest accuracy of 91.0% for multiclass classification and an average accuracy of 92.33% across binary classifications. Notably, head position and facial landmarks yielded the best performance for individual facial regions, achieving 85.0% and 88.0% accuracy, respectively, in multiclass classification, and 89.66% and 91.0% accuracy, respectively, across binary classifications. This study introduces a non-intrusive, cost-effective solution that can be seamlessly integrated into everyday smartphones for continuous anxiety monitoring, offering a promising pathway for early detection and intervention.
CVFeb 20
DohaScript: A Large-Scale Multi-Writer Dataset for Continuous Handwritten Hindi TextKunwar Arpit Singh, Ankush Prakash, Haroon R Lone
Despite having hundreds of millions of speakers, handwritten Devanagari text remains severely underrepresented in publicly available benchmark datasets. Existing resources are limited in scale, focus primarily on isolated characters or short words, and lack controlled lexical content and writer level diversity, which restricts their utility for modern data driven handwriting analysis. As a result, they fail to capture the continuous, fused, and structurally complex nature of Devanagari handwriting, where characters are connected through a shared shirorekha (horizontal headline) and exhibit rich ligature formations. We introduce DohaScript, a large scale, multi writer dataset of handwritten Hindi text collected from 531 unique contributors. The dataset is designed as a parallel stylistic corpus, in which all writers transcribe the same fixed set of six traditional Hindi dohas (couplets). This controlled design enables systematic analysis of writer specific variation independent of linguistic content, and supports tasks such as handwriting recognition, writer identification, style analysis, and generative modeling. The dataset is accompanied by non identifiable demographic metadata, rigorous quality curation based on objective sharpness and resolution criteria, and page level layout difficulty annotations that facilitate stratified benchmarking. Baseline experiments demonstrate clear quality separation and strong generalization to unseen writers, highlighting the dataset's reliability and practical value. DohaScript is intended to serve as a standardized and reproducible benchmark for advancing research on continuous handwritten Devanagari text in low resource script settings.
CLJul 11, 2025
An Offline Mobile Conversational Agent for Mental Health Support: Learning from Emotional Dialogues and Psychological Texts with Student-Centered EvaluationVimaleswar A, Prabhu Nandan Sahu, Nilesh Kumar Sahu et al.
Mental health plays a crucial role in the overall well-being of an individual. In recent years, digital platforms have been increasingly used to expand mental health and emotional support. However, there are persistent challenges related to limited user accessibility, internet connectivity, and data privacy, which highlight the need for an offline, smartphone-based solution. To address these challenges, we propose EmoSApp (Emotional Support App): an entirely offline, smartphone-based conversational app designed for mental health and emotional support. The system leverages Large Language Models (LLMs), specifically fine-tuned, quantized and deployed using Torchtune and Executorch for resource-constrained devices, allowing all inferences to occur on the smartphone. To equip EmoSApp with robust domain expertise, we fine-tuned the LLaMA-3.2-1B-Instruct model on our custom curated ``Knowledge dataset'' of 14,582 mental-health QA pairs, along with the multi-turn conversational data. Through qualitative human evaluation with the student population, we demonstrate that EmoSApp has the ability to respond coherently, empathetically, maintain interactive dialogue, and provide relevant suggestions to user's mental health problems. Additionally, quantitative evaluations on nine standard commonsense and reasoning benchmarks demonstrate the efficacy of our fine-tuned, quantized model in low-resource settings. By prioritizing on-device deployment and specialized domain adaptation, EmoSApp serves as a blueprint for future innovations in portable, secure, and highly tailored AI-driven mental health solutions.
HCJul 10, 2025
Real-World Receptivity to Adaptive Mental Health Interventions: Findings from an In-the-Wild StudyNilesh Kumar Sahu, Aditya Sneh, Snehil Gupta et al.
The rise of mobile health (mHealth) technologies has enabled real-time monitoring and intervention for mental health conditions using passively sensed smartphone data. Building on these capabilities, Just-in-Time Adaptive Interventions (JITAIs) seek to deliver personalized support at opportune moments, adapting to users' evolving contexts and needs. Although prior research has examined how context affects user responses to generic notifications and general mHealth messages, relatively little work has explored its influence on engagement with actual mental health interventions. Furthermore, while much of the existing research has focused on detecting when users might benefit from an intervention, less attention has been paid to understanding receptivity, i.e., users' willingness and ability to engage with and act upon the intervention. In this study, we investigate user receptivity through two components: acceptance(acknowledging or engaging with a prompt) and feasibility (ability to act given situational constraints). We conducted a two-week in-the-wild study with 70 students using a custom Android app, LogMe, which collected passive sensor data and active context reports to prompt mental health interventions. The adaptive intervention module was built using Thompson Sampling, a reinforcement learning algorithm. We address four research questions relating smartphone features and self-reported contexts to acceptance and feasibility, and examine whether an adaptive reinforcement learning approach can optimize intervention delivery by maximizing a combined receptivity reward. Our results show that several types of passively sensed data significantly influenced user receptivity to interventions. Our findings contribute insights into the design of context-aware, adaptive interventions that are not only timely but also actionable in real-world settings.
SPMar 24, 2025
Are Anxiety Detection Models Generalizable? A Cross-Activity and Cross-Population Study Using WearablesNilesh Kumar Sahu, Snehil Gupta, Haroon R Lone
Anxiety-provoking activities, such as public speaking, can trigger heightened anxiety responses in individuals with anxiety disorders. Recent research suggests that physiological signals, including electrocardiogram (ECG) and electrodermal activity (EDA), collected via wearable devices, can be used to detect anxiety in such contexts through machine learning models. However, the generalizability of these anxiety prediction models across different activities and diverse populations remains underexplored-an essential step for assessing model bias and fostering user trust in broader applications. To address this gap, we conducted a study with 111 participants who engaged in three anxiety-provoking activities. Utilizing both our collected dataset and two well-known publicly available datasets, we evaluated the generalizability of anxiety detection models within participants (for both same-activity and cross-activity scenarios) and across participants (within-activity and cross-activity). In total, we trained and tested more than 3348 anxiety detection models (using six classifiers, 31 feature sets, and 18 train-test configurations). Our results indicate that three key metrics-AUROC, recall for anxious states, and recall for non-anxious states-were slightly above the baseline score of 0.5. The best AUROC scores ranged from 0.62 to 0.73, with recall for the anxious class spanning 35.19% to 74.3%. Interestingly, model performance (as measured by AUROC) remained relatively stable across different activities and participant groups, though recall for the anxious class did exhibit some variation.
CVDec 1, 2024
Thermal Vision: Pioneering Non-Invasive Temperature Tracking in Congested SpacesArijit Samal, Haroon R Lone
Non-invasive temperature monitoring of individuals plays a crucial role in identifying and isolating symptomatic individuals. Temperature monitoring becomes particularly vital in settings characterized by close human proximity, often referred to as dense settings. However, existing research on non-invasive temperature estimation using thermal cameras has predominantly focused on sparse settings. Unfortunately, the risk of disease transmission is significantly higher in dense settings like movie theaters or classrooms. Consequently, there is an urgent need to develop robust temperature estimation methods tailored explicitly for dense settings. Our study proposes a non-invasive temperature estimation system that combines a thermal camera with an edge device. Our system employs YOLO models for face detection and utilizes a regression framework for temperature estimation. We evaluated the system on a diverse dataset collected in dense and sparse settings. Our proposed face detection model achieves an impressive mAP score of over 84 in both in-dataset and cross-dataset evaluations. Furthermore, the regression framework demonstrates remarkable performance with a mean square error of 0.18$^{\circ}$C and an impressive $R^2$ score of 0.96. Our experiments' results highlight the developed system's effectiveness, positioning it as a promising solution for continuous temperature monitoring in real-world applications. With this paper, we release our dataset and programming code publicly.