CLJun 18, 2024

Towards a Client-Centered Assessment of LLM Therapists by Client Simulation

arXiv:2406.12266v230 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the need for ethical and consistent assessment of LLM-based therapy tools, but it is incremental as it adapts existing simulation methods from clinical education to AI.

The paper tackles the problem of assessing LLM therapists from the client's perspective by proposing ClientCAST, which uses LLMs to simulate clients for safe, scalable evaluation. The approach evaluated models like Claude-3 and GPT-3.5, though no concrete performance numbers were provided in the abstract.

Although there is a growing belief that LLMs can be used as therapists, exploring LLMs' capabilities and inefficacy, particularly from the client's perspective, is limited. This work focuses on a client-centered assessment of LLM therapists with the involvement of simulated clients, a standard approach in clinical medical education. However, there are two challenges when applying the approach to assess LLM therapists at scale. Ethically, asking humans to frequently mimic clients and exposing them to potentially harmful LLM outputs can be risky and unsafe. Technically, it can be difficult to consistently compare the performances of different LLM therapists interacting with the same client. To this end, we adopt LLMs to simulate clients and propose ClientCAST, a client-centered approach to assessing LLM therapists by client simulation. Specifically, the simulated client is utilized to interact with LLM therapists and complete questionnaires related to the interaction. Based on the questionnaire results, we assess LLM therapists from three client-centered aspects: session outcome, therapeutic alliance, and self-reported feelings. We conduct experiments to examine the reliability of ClientCAST and use it to evaluate LLMs therapists implemented by Claude-3, GPT-3.5, LLaMA3-70B, and Mixtral 8*7B. Codes are released at https://github.com/wangjs9/ClientCAST.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes