Thomas D. Hull

HC
h-index12
3papers
2citations
Novelty52%
AI Score43

3 Papers

CLDec 23, 2025
Adversarial Training for Failure-Sensitive User Simulation in Mental Health Dialogue Optimization

Ziyi Zhu, Olivier Tieleman, Caitlin A. Stamatis et al. · cambridge

Realistic user simulation is crucial for training and evaluating task-oriented dialogue (TOD) systems, yet creating simulators that accurately replicate human behavior remains challenging. A key property of effective simulators is their ability to expose failure modes of the systems they evaluate. We present an adversarial training framework that iteratively improves user simulator realism through a competitive dynamic between a generator (user simulator) and a discriminator. Applied to mental health support chatbots, our approach demonstrates that fine-tuned simulators dramatically outperform zero-shot base models at surfacing system issues, and adversarial training further enhances diversity, distributional alignment, and predictive validity. The resulting simulator achieves a strong correlation between simulated and real failure occurrence rates across diverse chatbot configurations while maintaining low distributional divergence of failure modes. Discriminator accuracy decreases drastically after three adversarial iterations, suggesting improved realism. These results provide evidence that adversarial training is a promising approach for creating realistic user simulators in mental health support TOD domains, enabling rapid, reliable, and cost-effective system evaluation before deployment.

HCFeb 24
Talking to a Human as an Attitudinal Barrier: A Mixed Methods Evaluation of Stigma, Access, and the Appeal of AI Mental Health Support

Caitlin A. Stamatis, Emma C. Wolfe, Matteo Malgaroli et al.

Background: Many people who could benefit from therapy do not receive it. Conversational AI is increasingly used for mental health support, yet it is unclear which barriers AI helps mitigate. We examined whether evaluation-sensitive (shame/stigma) and structural barriers (cost/coverage/access) to psychotherapy predict perceived helpfulness of an AI mental health conversational tool (Ash), and whether effects differ by prior therapy experience or user engagement. Methods: Participants (n=395) rated Ash's helpfulness (1-5) and described barriers to therapy. Open-text responses were coded for shame/stigma, access, and cost/coverage themes. Linear regressions examined associations between barriers and perceived helpfulness, adjusting for demographics and mental health, with moderation by therapy experience. Results: Shame/stigma (B=.45, p<.001) and access barriers (B=.31, p=.020) predicted higher perceived helpfulness but cost/coverage did not (B=.13, p=.262). Prior therapy experience moderated the shame effect (interaction B=.56, p=.036): shame predicted higher helpfulness among therapy-experienced users ($Δ$=.62, p<.001) but not therapy-naive users ($Δ$=.03, p=.877). Among therapy-experienced participants (n=258), shame/stigma (B=.75, p<.001) and access barriers (B=.51, p=.006) predicted rating Ash more favorably. Access barriers predicted higher engagement (IRR=1.64, p<.001) and cost/coverage barriers predicted 70% more sessions (IRR=1.70, p<.001). Shame/stigma was not associated with total sessions (IRR=.80, p=.094). Conclusions: AI mental health support was perceived as most helpful by users facing shame/stigma and access barriers, particularly for therapy-experienced individuals. Access and cost barriers were most predictive of usage intensity, suggesting unmet needs. Findings highlight the importance of aligning AI tools for emotional support with user-reported barriers.

70.0HCApr 30
Engagement Phenotypes for a Sample of 102,684 AI Mental Health Chatbot Users and Dose-Response Associations with Clinical Outcomes

Emma C. Wolfe, Ting Su, Olivier Tieleman et al.

Background: Conversational AI chatbots are emerging as scalable mental health tools, but little is known about real world engagement or its relationship to clinical outcomes. Objective: To characterize engagement phenotypes among users of Ash, a purpose-built AI mental health chatbot, and examine associations with clinical change and working alliance. Methods: K-means clustering across eight behavioral features identified engagement phenotypes among 102,684 users. Subsamples completed the PHQ-9 (n=298), GAD-7 (n=298), and MSPSS (social support; n=194) baseline and 3 weeks; 11,437 users completed baseline Working Alliance Inventory (WAI). Results: Five engagement phenotypes emerged: Early Dropouts (52.2%), Power Users (1.6%), Intensive Users (4.1%), Weekly Users (25.3%), and a novel Concentrated User pattern (16.8%); across users, 66.9% had at least one overnight session (9pm-5am). Significant pre-post improvements occurred in depression (d = -0.51), anxiety (d = -0.57), and social support (d = 0.22). An observed dose-response gradient in self-reported depression improvement was replicated in a larger sample with model-predicted PHQ-9 (n = 23,813; Power Users d = -0.54; Early Dropouts d = -0.13). Higher working alliance predicted depression improvement and moderated the engagement-social support relationship. Conclusions: Engagement with AI mental health tools is multidimensional, and different clinical outcomes respond to different dimensions of use. Findings caution against treating session counts as a primary engagement metric and offer naturalistic evidence for the clinical value of purpose-built conversational AI.