Evaluating LLM-persona Generated Distributions for Decision-making
This addresses the need for reliable evaluation methods in using LLMs for business decisions like pricing and inventory management, but it is incremental as it focuses on specific metrics and problems.
The paper tackled the problem of evaluating LLM-generated distributions for decision-making, finding that they are practically useful in low-data regimes and that decision-agnostic metrics like Wasserstein distance can be misleading.
LLMs can generate a wealth of data, ranging from simulated personas imitating human valuations and preferences, to demand forecasts based on world knowledge. But how well do such LLM-generated distributions support downstream decision-making? For example, when pricing a new product, a firm could prompt an LLM to simulate how much consumers are willing to pay based on a product description, but how useful is the resulting distribution for optimizing the price? We refer to this approach as LLM-SAA, in which an LLM is used to construct an estimated distribution and the decision is then optimized under that distribution. In this paper, we study metrics to evaluate the quality of these LLM-generated distributions, based on the decisions they induce. Taking three canonical decision-making problems (assortment optimization, pricing, and newsvendor) as examples, we find that LLM-generated distributions are practically useful, especially in low-data regimes. We also show that decision-agnostic metrics such as Wasserstein distance can be misleading when evaluating these distributions for decision-making.