CLMay 31, 2023

Revisiting the Reliability of Psychological Scales on Large Language Models

Jen-tse Huang, Wenxiang Jiao, Man Ho Lam, Eric John Li, Wenxuan Wang, Michael R. Lyu

arXiv:2305.19926v58.125 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the problem for researchers in psychology and AI of validating the use of LLMs as proxies for human subjects in personality assessments, though it is incremental in refining existing methods.

The study assessed the reliability of applying human-designed psychological personality tests to large language models (LLMs) and found that models like GPT-3.5, GPT-4, Gemini-Pro, and LLaMA-3.1 showed consistent responses to the Big Five Inventory across 2,500 settings, indicating satisfactory reliability. It also demonstrated that GPT-3.5 can emulate diverse personalities with specific prompts, suggesting potential for cost reduction in social sciences by substituting human participants.

Recent research has focused on examining Large Language Models' (LLMs) characteristics from a psychological standpoint, acknowledging the necessity of understanding their behavioral characteristics. The administration of personality tests to LLMs has emerged as a noteworthy area in this context. However, the suitability of employing psychological scales, initially devised for humans, on LLMs is a matter of ongoing debate. Our study aims to determine the reliability of applying personality assessments to LLMs, explicitly investigating whether LLMs demonstrate consistent personality traits. Analysis of 2,500 settings per model, including GPT-3.5, GPT-4, Gemini-Pro, and LLaMA-3.1, reveals that various LLMs show consistency in responses to the Big Five Inventory, indicating a satisfactory level of reliability. Furthermore, our research explores the potential of GPT-3.5 to emulate diverse personalities and represent various groups-a capability increasingly sought after in social sciences for substituting human participants with LLMs to reduce costs. Our findings reveal that LLMs have the potential to represent different personalities with specific prompt instructions.

View on arXiv PDF Code

Similar