CLNov 15, 2023

PsyEval: A Suite of Mental Health Related Tasks for Evaluating Large Language Models

Haoan Jin, Siyuan Chen, Dilawaier Dilixiati, Yewei Jiang, Mengyue Wu, Kenny Q. Zhu

arXiv:2311.09189v23.99 citationsh-index: 8Has Code

Originality Synthesis-oriented

AI Analysis

This provides a specialized evaluation tool for researchers and developers working on LLMs in mental health, though it is incremental as it focuses on benchmarking rather than novel model improvements.

The paper tackles the problem of evaluating large language models (LLMs) in the mental health domain by introducing PsyEval, a comprehensive suite of tasks, and finds that current LLMs show significant room for improvement in this area.

Evaluating Large Language Models (LLMs) in the mental health domain poses distinct challenged from other domains, given the subtle and highly subjective nature of symptoms that exhibit significant variability among individuals. This paper presents PsyEval, the first comprehensive suite of mental health-related tasks for evaluating LLMs. PsyEval encompasses five sub-tasks that evaluate three critical dimensions of mental health. This comprehensive framework is designed to thoroughly assess the unique challenges and intricacies of mental health-related tasks, making PsyEval a highly specialized and valuable tool for evaluating LLM performance in this domain. We evaluate twelve advanced LLMs using PsyEval. Experiment results not only demonstrate significant room for improvement in current LLMs concerning mental health but also unveil potential directions for future model optimization.

View on arXiv PDF Code

Similar