HCCLSep 3, 2024

Therapy as an NLP Task: Psychologists' Comparison of LLMs and Human Peers in CBT

arXiv:2409.02244v228 citationsh-index: 6
Originality Incremental advance
AI Analysis

This research addresses the problem of evaluating LLMs as scalable therapists for mental health support, highlighting limitations in session-level performance and calling for hybrid human-AI workflows, though it is incremental in building on prior single-response studies.

The study compared session-level behaviors of human counselors and an LLM prompted for single-session Cognitive Behavioral Therapy (CBT), finding that human counselors excel in relational strategies like empathy and cultural sensitivity, while LLMs show higher procedural adherence but struggle with collaboration and cultural cues, indicating therapy cannot be reduced to an NLP task.

Large language models (LLMs) are being used as ad-hoc therapists. Research suggests that LLMs outperform human counselors when generating a single, isolated empathetic response; however, their session-level behavior remains understudied. In this study, we compare the session-level behaviors of human counselors with those of an LLM prompted by a team of peer counselors to deliver single-session Cognitive Behavioral Therapy (CBT). Our three-stage, mixed-methods study involved: a) a year-long ethnography of a text-based support platform where seven counselors iteratively refined CBT prompts through self-counseling and weekly focus groups; b) the manual simulation of human counselor sessions with a CBT-prompted LLM, given the full patient dialogue and contextual notes; and c) session evaluations of both human and LLM sessions by three licensed clinical psychologists using CBT competence measures. Our results show a clear trade-off. Human counselors excel at relational strategies -- small talk, self-disclosure, and culturally situated language -- that lead to higher empathy, collaboration, and deeper user reflection. LLM counselors demonstrate higher procedural adherence to CBT techniques but struggle to sustain collaboration, misread cultural cues, and sometimes produce "deceptive empathy," i.e., formulaic warmth that can inflate users' expectations of genuine human care. Taken together, our findings imply that while LLMs might outperform counselors in generating single empathetic responses, their ability to lead sessions is more limited, highlighting that therapy cannot be reduced to a standalone natural language processing (NLP) task. We call for carefully designed human-AI workflows in scalable support: LLMs can scaffold evidence-based techniques, while peers provide relational support. We conclude by mapping concrete design opportunities and ethical guardrails for such hybrid systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes