CLMar 7, 2025

Chain of Strategy Optimization Makes Large Language Models Better Emotional Supporter

arXiv:2503.05362v314 citationsh-index: 22EMNLP
Originality Incremental advance
AI Analysis

This addresses the need for better emotional support in AI for users facing emotional stress, representing an incremental improvement in a domain-specific application.

The paper tackles the problem of low strategy selection accuracy and preference bias in Large Language Models for Emotional Support Conversations, proposing Chain-of-Strategy Optimization to improve these issues, with experiments showing it outperforms standard supervised fine-tuning on models like LLaMA-3.1-8B.

The growing emotional stress in modern society has increased the demand for Emotional Support Conversations (ESC). While Large Language Models (LLMs) show promise for ESC, they face two key challenges: (1) low strategy selection accuracy, and (2) preference bias, limiting their adaptability to emotional needs of users. Existing supervised fine-tuning (SFT) struggles to address these issues, as it rigidly trains models on single gold-standard responses without modeling nuanced strategy trade-offs. To overcome these limitations, we propose Chain-of-Strategy Optimization (CSO), a novel approach that optimizes strategy selection preferences at each dialogue turn. We first leverage Monte Carlo Tree Search to construct ESC-Pro, a high-quality preference dataset with turn-level strategy-response pairs. Training on ESC-Pro with CSO improves both strategy accuracy and bias mitigation, enabling LLMs to generate more empathetic and contextually appropriate responses. Experiments on LLaMA-3.1-8B, Gemma-2-9B, and Qwen2.5-7B demonstrate that CSO outperforms standard SFT, highlighting the efficacy of fine-grained, turn-level preference modeling in ESC.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes