AIDec 14, 2025

Large Language Newsvendor: Decision Biases and Cognitive Mechanisms

arXiv:2512.12552v12 citations
Originality Incremental advance
AI Analysis

This research addresses the risk of LLMs amplifying biases in business decision-making, particularly for supply chain managers, but it is incremental as it applies existing bias frameworks to LLMs in a new context.

The study investigated how large language models (LLMs) replicate and amplify human cognitive biases in high-stakes operational decisions like supply chain management, finding that LLMs consistently showed biases such as 'Too Low/Too High' ordering and amplified demand-chasing behavior, with GPT-4 demonstrating the greatest irrationality while GPT-4o performed near-optimally.

Problem definition: Although large language models (LLMs) are increasingly integrated into business decision making, their potential to replicate and even amplify human cognitive biases cautions a significant, yet not well-understood, risk. This is particularly critical in high-stakes operational contexts like supply chain management. To address this, we investigate the decision-making patterns of leading LLMs using the canonical newsvendor problem in a dynamic setting, aiming to identify the nature and origins of their cognitive biases. Methodology/results: Through dynamic, multi-round experiments with GPT-4, GPT-4o, and LLaMA-8B, we tested for five established decision biases. We found that LLMs consistently replicated the classic ``Too Low/Too High'' ordering bias and significantly amplified other tendencies like demand-chasing behavior compared to human benchmarks. Our analysis uncovered a ``paradox of intelligence'': the more sophisticated GPT-4 demonstrated the greatest irrationality through overthinking, while the efficiency-optimized GPT-4o performed near-optimally. Because these biases persist even when optimal formulas are provided, we conclude they stem from architectural constraints rather than knowledge gaps. Managerial implications: First, managers should select models based on the specific task, as our results show that efficiency-optimized models can outperform more complex ones on certain optimization problems. Second, the significant amplification of bias by LLMs highlights the urgent need for robust human-in-the-loop oversight in high-stakes decisions to prevent costly errors. Third, our findings suggest that designing structured, rule-based prompts is a practical and effective strategy for managers to constrain models' heuristic tendencies and improve the reliability of AI-assisted decisions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes