HCApr 30

Engagement Phenotypes for a Sample of 102,684 AI Mental Health Chatbot Users and Dose-Response Associations with Clinical Outcomes

Emma C. Wolfe, Ting Su, Olivier Tieleman, Thomas D. Hull, Matteo Malgaroli, Caitlin A. Stamatis

arXiv:2605.0027510.4

Predicted impact top 37% in HC · last 90 daysOriginality Incremental advance

AI Analysis

For mental health chatbot developers and clinicians, this provides naturalistic evidence that engagement is multidimensional and that different usage patterns predict different clinical outcomes, cautioning against simplistic session-count metrics.

This study identified five engagement phenotypes among 102,684 users of an AI mental health chatbot and found dose-response associations with clinical improvements, including depression (d=-0.51), anxiety (d=-0.57), and social support (d=0.22). Power Users showed the largest depression improvement (d=-0.54) while Early Dropouts showed the smallest (d=-0.13).

Background: Conversational AI chatbots are emerging as scalable mental health tools, but little is known about real world engagement or its relationship to clinical outcomes. Objective: To characterize engagement phenotypes among users of Ash, a purpose-built AI mental health chatbot, and examine associations with clinical change and working alliance. Methods: K-means clustering across eight behavioral features identified engagement phenotypes among 102,684 users. Subsamples completed the PHQ-9 (n=298), GAD-7 (n=298), and MSPSS (social support; n=194) baseline and 3 weeks; 11,437 users completed baseline Working Alliance Inventory (WAI). Results: Five engagement phenotypes emerged: Early Dropouts (52.2%), Power Users (1.6%), Intensive Users (4.1%), Weekly Users (25.3%), and a novel Concentrated User pattern (16.8%); across users, 66.9% had at least one overnight session (9pm-5am). Significant pre-post improvements occurred in depression (d = -0.51), anxiety (d = -0.57), and social support (d = 0.22). An observed dose-response gradient in self-reported depression improvement was replicated in a larger sample with model-predicted PHQ-9 (n = 23,813; Power Users d = -0.54; Early Dropouts d = -0.13). Higher working alliance predicted depression improvement and moderated the engagement-social support relationship. Conclusions: Engagement with AI mental health tools is multidimensional, and different clinical outcomes respond to different dimensions of use. Findings caution against treating session counts as a primary engagement metric and offer naturalistic evidence for the clinical value of purpose-built conversational AI.

View on arXiv PDF

Similar