CLCYAug 11, 2025

Exploring Safety Alignment Evaluation of LLMs in Chinese Mental Health Dialogues via LLM-as-Judge

arXiv:2508.08236v12 citationsh-index: 18
Originality Incremental advance
AI Analysis

This addresses the problem of safety evaluation in ethically sensitive mental health interactions for researchers and practitioners, though it is incremental as it adapts existing LLM-as-Judge methods to a specific domain.

The authors tackled the challenge of evaluating safety alignment in LLM responses for Chinese mental health dialogues by proposing PsyCrisis-Bench, a reference-free benchmark using an LLM-as-Judge approach, achieving the highest agreement with expert assessments on 3600 judgments.

Evaluating the safety alignment of LLM responses in high-risk mental health dialogues is particularly difficult due to missing gold-standard answers and the ethically sensitive nature of these interactions. To address this challenge, we propose PsyCrisis-Bench, a reference-free evaluation benchmark based on real-world Chinese mental health dialogues. It evaluates whether the model responses align with the safety principles defined by experts. Specifically designed for settings without standard references, our method adopts a prompt-based LLM-as-Judge approach that conducts in-context evaluation using expert-defined reasoning chains grounded in psychological intervention principles. We employ binary point-wise scoring across multiple safety dimensions to enhance the explainability and traceability of the evaluation. Additionally, we present a manually curated, high-quality Chinese-language dataset covering self-harm, suicidal ideation, and existential distress, derived from real-world online discourse. Experiments on 3600 judgments show that our method achieves the highest agreement with expert assessments and produces more interpretable evaluation rationales compared to existing approaches. Our dataset and evaluation tool are publicly available to facilitate further research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes