HCMay 20

When Support Escalates Distress: Regulation and Escalation in LLM Responses to Venting and Advice-Seeking

arXiv:2605.2156955.5
Predicted impact top 26% in HC · last 90 daysOriginality Incremental advance
AI Analysis

For developers and deployers of LLMs in mental health support, this work reveals a critical blind spot in current safety evaluations—responses that feel supportive can simultaneously intensify distress, and empathy metrics alone are insufficient.

The study investigates whether LLM responses to venting vs. advice-seeking posts regulate or amplify emotional distress. Using 178,800 Reddit posts, they find that GPT-5.3's responses mirror help-seeking style: venting elicits both more regulation and more escalation, with therapist personas reducing escalation without sacrificing user experience, while lay raters fail to detect escalation.

Large language models are increasingly used for mental health support, yet little is known about whether their responses are psychologically safe across different help-seeking styles. We examine a foundational distinction in emotional disclosure, venting vs. advice-seeking, and whether LLMs respond in ways that regulate or amplify distress. Using 178,800 Reddit posts, we first show the two help-seeking styles are linguistically distinguishable at scale. We then introduce a measurement framework grounded in interpersonal emotion regulation theory that captures Regulation and Escalation as empirically independent dimensions. Across persona conditions (default, friend, therapist), GPT-5.3 responses systematically mirror help-seeking style: venting elicits more regulation, but also more escalation. Therapist personas reduce escalation while maintaining regulation, whereas friend personas increase both. A crowdsourced human study finds no user experience penalty for the safer therapist condition, but reveals that lay raters cannot reliably detect escalation without expert knowledge. Responses that feel supportive may simultaneously intensify distress in ways standard safety evaluation cannot see, and empathy metrics alone cannot replace a framework that measures both.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes