A Turing Test: Are AI Chatbots Behaviorally Similar to Humans?
This work addresses the problem of assessing AI-human behavioral similarity for researchers and developers, providing empirical evidence that chatbots can mimic human-like traits, though it is incremental in applying existing tests to new AI models.
The study administered a Turing Test to AI chatbots, including ChatGPT-4, using behavioral games and a Big-5 personality survey, finding that their behavior and traits were statistically indistinguishable from random humans across tens of thousands of subjects from over 50 countries, with chatbots often behaving more altruistically and cooperatively than average humans.
We administer a Turing Test to AI Chatbots. We examine how Chatbots behave in a suite of classic behavioral games that are designed to elicit characteristics such as trust, fairness, risk-aversion, cooperation, \textit{etc.}, as well as how they respond to a traditional Big-5 psychological survey that measures personality traits. ChatGPT-4 exhibits behavioral and personality traits that are statistically indistinguishable from a random human from tens of thousands of human subjects from more than 50 countries. Chatbots also modify their behavior based on previous experience and contexts ``as if'' they were learning from the interactions, and change their behavior in response to different framings of the same strategic situation. Their behaviors are often distinct from average and modal human behaviors, in which case they tend to behave on the more altruistic and cooperative end of the distribution. We estimate that they act as if they are maximizing an average of their own and partner's payoffs.