AIHCNov 29, 2024

Fine-Tuning Open-Weight Language Models to Deliver Cognitive Behavioral Therapy for Depression: A Feasibility Study

arXiv:2412.00251v23 citationsh-index: 1
Originality Synthesis-oriented
AI Analysis

This addresses the problem of limited access to CBT therapy for individuals with depression, though it is an incremental application of existing fine-tuning methods to a new domain.

This study explored the feasibility of fine-tuning small open-weight language models to deliver Cognitive Behavioral Therapy (CBT) for depression, finding that CBT-tuned models significantly outperformed their instruct-tuned counterparts with an average improvement of 11.33 points on a CBT fidelity scale.

Cognitive Behavioral Therapy (CBT) is a well-established, evidence-based treatment for Major Depressive Disorder. Unfortunately, there exist significant barriers to individuals accessing CBT, including cost, scarcity of therapists and stigma. This study explores the feasibility of fine-tuning small open weight large language models (LLMs) to deliver CBT for depression. Using synthetic CBT transcripts generated by the Nous Research fine-tune of Llama 3.1 405b, we fine-tuned three models: Mistral 7b v0.3, Qwen 2.5 7b, and Llama 3.1 8b. CBT fidelity was evaluated through a modified Cognitive Therapy Rating Scale (CTRS). All fine-tuned models were compared against each other, as well as their instruct-tuned variants. Simulated patient transcripts were generated for the purpose of evaluating model performance, with the instruct and CBT-tuned models acting as the therapist and DeepSeek-V2.5 acting as the patient. These simulated transcripts were evaluated on a modified CTRS by Gemini 1.5 Pro-002. Our findings demonstrated that the CBT-tuned models significantly outperformed their instruct-tuned counterparts, with an average improvement of 11.33 points (p < 0.001) on total CTRS score. Llama 3.1 8b had the strongest performance (mean CTRS score 67.86 +/- 7.24), followed by Qwen 2.5 7b (64.28 +/- 9.55) and Mistral 7b v0.3 (64.17 +/- 9.79), with these differences between models being statistically significant. The CBT-tuned models were competent in implementing core CBT techniques and providing empathetic responses, however, there were limitations observed in agenda adherence, exploration depth and long-context coherence. This study establishes that CBT specific fine-tuning can effectively encode therapeutic competencies in small LLMs, though significant technical and ethical considerations must be resolved prior to clinical deployment.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes