AI CL CY HCSep 14, 2023

Assessing the nature of large language models: A caution against anthropocentrism

arXiv:2309.07683v35 citationsh-index: 3

Originality Synthesis-oriented

AI Analysis

This research provides empirical evidence to caution against anthropomorphizing LLMs, addressing public and academic debates about AI sentience and behavior, though it is incremental in applying standard psychological measures to AI.

The study assessed large language models (LLMs) like GPT-3.5 using cognitive and personality tests to address concerns about their capabilities and sentience, finding they are unlikely to be sentient and exhibit high variability and poor mental health traits compared to humans.

Generative AI models garnered a large amount of public attention and speculation with the release of OpenAIs chatbot, ChatGPT. At least two opinion camps exist: one excited about possibilities these models offer for fundamental changes to human tasks, and another highly concerned about power these models seem to have. To address these concerns, we assessed several LLMs, primarily GPT 3.5, using standard, normed, and validated cognitive and personality measures. For this seedling project, we developed a battery of tests that allowed us to estimate the boundaries of some of these models capabilities, how stable those capabilities are over a short period of time, and how they compare to humans. Our results indicate that LLMs are unlikely to have developed sentience, although its ability to respond to personality inventories is interesting. GPT3.5 did display large variability in both cognitive and personality measures over repeated observations, which is not expected if it had a human-like personality. Variability notwithstanding, LLMs display what in a human would be considered poor mental health, including low self-esteem, marked dissociation from reality, and in some cases narcissism and psychopathy, despite upbeat and helpful responses.

View on arXiv PDF

Similar