Influence of Solution Efficiency and Valence of Instruction on Additive and Subtractive Solution Strategies in Humans and GPT-4
It addresses biases in LLMs like GPT-4 for users relying on them in decision-making, but is incremental as it extends prior research on addition biases.
This study compared human and GPT-4 problem-solving on spatial and linguistic tasks, finding that GPT-4 had a stronger tendency towards additive transformations than humans, with GPT-4 showing increased additive strategies when subtraction was more efficient and when instructions were positive.
Generative artificial intelligences, particularly large language models (LLMs), play an increasingly prominent role in human decision-making contexts, necessitating transparency about their capabilities. While prior studies have shown addition biases in humans (Adams et al., 2021) and OpenAI's GPT-3 (Winter et al., 2023), this study extends the research by comparing human and GPT-4 problem-solving across both spatial and linguistic tasks, with variations in solution efficiency and valence of task instruction. Four preregistered experiments with 588 participants from the U.S. and 680 GPT-4 iterations revealed a stronger tendency towards additive transformations in GPT-4 than in humans. Human participants were less likely to use additive strategies when subtraction was relatively more efficient than when addition and subtraction were equally efficient. GPT-4 exhibited the opposite behavior, with a strong addition bias when subtraction was more efficient. In terms of valence of task instruction, GPT-4's use of additive strategies increased when instructed to "improve" (positive) rather than "edit" (neutral). These findings demonstrate that biases in human problem-solving are amplified in GPT-4, and that LLM behavior differs from human efficiency-based strategies. This highlights the limitations of LLMs and the need for caution when using them in real-world applications.