MA AI CYAug 27, 2025

Validating Generative Agent-Based Models for Logistics and Supply Chain Management Research

arXiv:2508.20234v1h-index: 6

Originality Incremental advance

AI Analysis

This provides a validation framework for LSCM researchers to ensure rigorous GABM development, addressing a specific methodological gap in the field.

This study tackled the problem of validating Generative Agent-Based Models (GABMs) using large language models (LLMs) for logistics and supply chain management research by testing six LLMs against 957 human participants in food delivery scenarios. Results showed that GABMs can simulate human behaviors effectively, but revealed a paradox where some LLMs achieved surface-level equivalence while exhibiting artificial decision processes not found in humans.

Generative Agent-Based Models (GABMs) powered by large language models (LLMs) offer promising potential for empirical logistics and supply chain management (LSCM) research by enabling realistic simulation of complex human behaviors. Unlike traditional agent-based models, GABMs generate human-like responses through natural language reasoning, which creates potential for new perspectives on emergent LSCM phenomena. However, the validity of LLMs as proxies for human behavior in LSCM simulations is unknown. This study evaluates LLM equivalence of human behavior through a controlled experiment examining dyadic customer-worker engagements in food delivery scenarios. I test six state-of-the-art LLMs against 957 human participants (477 dyads) using a moderated mediation design. This study reveals a need to validate GABMs on two levels: (1) human equivalence testing, and (2) decision process validation. Results reveal GABMs can effectively simulate human behaviors in LSCM; however, an equivalence-versus-process paradox emerges. While a series of Two One-Sided Tests (TOST) for equivalence reveals some LLMs demonstrate surface-level equivalence to humans, structural equation modeling (SEM) reveals artificial decision processes not present in human participants for some LLMs. These findings show GABMs as a potentially viable methodological instrument in LSCM with proper validation checks. The dual-validation framework also provides LSCM researchers with a guide to rigorous GABM development. For practitioners, this study offers evidence-based assessment for LLM selection for operational tasks.

View on arXiv PDF

Similar