LGCYDec 12, 2023

Humans vs Large Language Models: Judgmental Forecasting in an Era of Advanced AI

arXiv:2312.06941v225 citationsh-index: 6Int J Forecast
Originality Synthesis-oriented
AI Analysis

This research addresses the practical integration of LLMs into forecasting processes for retail decision-makers, highlighting limitations and incremental insights.

The study compared forecasting accuracy between human experts and five Large Language Models (LLMs) in retail, finding that LLMs did not consistently outperform humans, with both showing increased errors during promotional periods and positive external impacts.

This study investigates the forecasting accuracy of human experts versus Large Language Models (LLMs) in the retail sector, particularly during standard and promotional sales periods. Utilizing a controlled experimental setup with 123 human forecasters and five LLMs, including ChatGPT4, ChatGPT3.5, Bard, Bing, and Llama2, we evaluated forecasting precision through Mean Absolute Percentage Error. Our analysis centered on the effect of the following factors on forecasters performance: the supporting statistical model (baseline and advanced), whether the product was on promotion, and the nature of external impact. The findings indicate that LLMs do not consistently outperform humans in forecasting accuracy and that advanced statistical forecasting models do not uniformly enhance the performance of either human forecasters or LLMs. Both human and LLM forecasters exhibited increased forecasting errors, particularly during promotional periods and under the influence of positive external impacts. Our findings call for careful consideration when integrating LLMs into practical forecasting processes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes