CLMay 16, 2024

CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations

Jiahao Zhao, Jingwei Zhu, Minghuan Tan, Min Yang, Renhao Li, Di Yang, Chenhao Zhang, Guancheng Ye, Chengming Li, Xiping Hu, Derek F. Wong

arXiv:2405.10212v312.220 citationsh-index: 16Has CodeCOLING

Originality Synthesis-oriented

AI Analysis

This work addresses the need for better evaluation of psychology understanding in LLMs, particularly for Chinese language applications, though it is incremental as it adapts existing benchmark methods to a new domain.

The authors introduced CPsyExam, a Chinese benchmark for evaluating psychology knowledge in large language models, constructed from 4k exam questions to assess both psychological knowledge and case analysis. They evaluated various LLMs and found CPsyExam effectively enhances psychology understanding and enables model comparisons across granularities.

In this paper, we introduce a novel psychological benchmark, CPsyExam, constructed from questions sourced from Chinese language examinations. CPsyExam is designed to prioritize psychological knowledge and case analysis separately, recognizing the significance of applying psychological knowledge to real-world scenarios. From the pool of 22k questions, we utilize 4k to create the benchmark that offers balanced coverage of subjects and incorporates a diverse range of case analysis techniques.Furthermore, we evaluate a range of existing large language models~(LLMs), spanning from open-sourced to API-based models. Our experiments and analysis demonstrate that CPsyExam serves as an effective benchmark for enhancing the understanding of psychology within LLMs and enables the comparison of LLMs across various granularities.

View on arXiv PDF Code

Similar