Assessing Large Language Models' ability to predict how humans balance self-interest and the interest of others
This addresses the problem of AI reliability in social decision-making for developers and users, highlighting a bias that could lead to suboptimal outcomes in public policy or business, though it is incremental as it evaluates existing models on new data.
The study assessed three advanced chatbots' ability to predict human decisions in dictator games across 108 experiments, finding that only GPT-4 correctly identified qualitative behavioral patterns (self-interested, inequity-averse, altruistic), but it consistently underestimated self-interest and inequity-aversion while overestimating altruism.
Generative artificial intelligence (AI) holds enormous potential to revolutionize decision-making processes, from everyday to high-stake scenarios. By leveraging generative AI, humans can benefit from data-driven insights and predictions, enhancing their ability to make informed decisions that consider a wide array of factors and potential outcomes. However, as many decisions carry social implications, for AI to be a reliable assistant for decision-making it is crucial that it is able to capture the balance between self-interest and the interest of others. We investigate the ability of three of the most advanced chatbots to predict dictator game decisions across 108 experiments with human participants from 12 countries. We find that only GPT-4 (not Bard nor Bing) correctly captures qualitative behavioral patterns, identifying three major classes of behavior: self-interested, inequity-averse, and fully altruistic. Nonetheless, GPT-4 consistently underestimates self-interest and inequity-aversion, while overestimating altruistic behavior. This bias has significant implications for AI developers and users, as overly optimistic expectations about human altruism may lead to disappointment, frustration, suboptimal decisions in public policy or business contexts, and even social conflict.