CLAIMar 1, 2023

Can ChatGPT Assess Human Personalities? A General Evaluation Framework

arXiv:2303.01248v3167 citationsh-index: 70
Originality Incremental advance
AI Analysis

This work addresses the incremental challenge of using LLMs for human personality analysis, which could benefit psychologists or AI researchers interested in psychological applications.

The paper tackles the problem of evaluating whether large language models like ChatGPT can assess human personalities, proposing a generic framework based on MBTI tests and showing that ChatGPT achieves more consistent and fairer assessments, with lower robustness against prompt biases compared to InstructGPT.

Large Language Models (LLMs) especially ChatGPT have produced impressive results in various areas, but their potential human-like psychology is still largely unexplored. Existing works study the virtual personalities of LLMs but rarely explore the possibility of analyzing human personalities via LLMs. This paper presents a generic evaluation framework for LLMs to assess human personalities based on Myers Briggs Type Indicator (MBTI) tests. Specifically, we first devise unbiased prompts by randomly permuting options in MBTI questions and adopt the average testing result to encourage more impartial answer generation. Then, we propose to replace the subject in question statements to enable flexible queries and assessments on different subjects from LLMs. Finally, we re-formulate the question instructions in a manner of correctness evaluation to facilitate LLMs to generate clearer responses. The proposed framework enables LLMs to flexibly assess personalities of different groups of people. We further propose three evaluation metrics to measure the consistency, robustness, and fairness of assessment results from state-of-the-art LLMs including ChatGPT and GPT-4. Our experiments reveal ChatGPT's ability to assess human personalities, and the average results demonstrate that it can achieve more consistent and fairer assessments in spite of lower robustness against prompt biases compared with InstructGPT.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes