CLAICYMay 21, 2025

The Pursuit of Empathy: Evaluating Small Language Models for PTSD Dialogue Support

arXiv:2505.15065v26 citationsh-index: 5Has CodeEMNLP
Originality Synthesis-oriented
AI Analysis

This work addresses the need for resource-efficient, emotionally intelligent systems for mental health support, though it is incremental in evaluating existing small models on a new dataset.

This paper tackled the problem of generating empathetic responses for individuals with PTSD using small language models, finding that fine-tuning improved empathetic capabilities with models often approaching human-rated empathy levels, though gains varied across scenarios and smaller models had a knowledge transfer ceiling.

This paper investigates the capacity of small language models (0.5B-5B parameters) to generate empathetic responses for individuals with PTSD. We introduce Trauma-Informed Dialogue for Empathy (TIDE), a novel dataset comprising 10,000 two-turn conversations across 500 diverse, clinically-grounded PTSD personas (https://huggingface.co/datasets/yenopoya/TIDE). Using frontier model outputs as ground truth, we evaluate eight small LLMs in zero-shot settings and after fine-tuning. Fine-tuning enhances empathetic capabilities, improving cosine similarity and perceived empathy, although gains vary across emotional scenarios and smaller models exhibit a "knowledge transfer ceiling." As expected, Claude Sonnet 3.5 consistently outperforms all models, but surprisingly, the smaller models often approach human-rated empathy levels. Demographic analyses showed that older adults favored responses that validated distress before offering support (p = .004), while graduate-educated users preferred emotionally layered replies in specific scenarios. Gender-based differences were minimal (p > 0.15), suggesting the feasibility of broadly empathetic model designs. This work offers insights into building resource-efficient, emotionally intelligent systems for mental health support.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes