CL CE GNNov 4, 2025

Prompting for Policy: Forecasting Macroeconomic Scenarios with Synthetic LLM Personas

arXiv:2511.02458v11 citationsh-index: 3ICAIF

Originality Synthesis-oriented

AI Analysis

This addresses the problem of efficient and accurate macroeconomic forecasting for policymakers and economists, but it is incremental as it builds on existing LLM methods without a major breakthrough.

The study evaluated whether persona-based prompting improves GPT-4o's macroeconomic forecasting accuracy compared to human experts, finding no measurable advantage from personas and showing GPT-4o achieved competitive accuracy with modest differences from humans across 50 quarterly rounds and out-of-sample data.

We evaluate whether persona-based prompting improves Large Language Model (LLM) performance on macroeconomic forecasting tasks. Using 2,368 economics-related personas from the PersonaHub corpus, we prompt GPT-4o to replicate the ECB Survey of Professional Forecasters across 50 quarterly rounds (2013-2025). We compare the persona-prompted forecasts against the human experts panel, across four target variables (HICP, core HICP, GDP growth, unemployment) and four forecast horizons. We also compare the results against 100 baseline forecasts without persona descriptions to isolate its effect. We report two main findings. Firstly, GPT-4o and human forecasters achieve remarkably similar accuracy levels, with differences that are statistically significant yet practically modest. Our out-of-sample evaluation on 2024-2025 data demonstrates that GPT-4o can maintain competitive forecasting performance on unseen events, though with notable differences compared to the in-sample period. Secondly, our ablation experiment reveals no measurable forecasting advantage from persona descriptions, suggesting these prompt components can be omitted to reduce computational costs without sacrificing accuracy. Our results provide evidence that GPT-4o can achieve competitive forecasting accuracy even on out-of-sample macroeconomic events, if provided with relevant context data, while revealing that diverse prompts produce remarkably homogeneous forecasts compared to human panels.

View on arXiv PDF

Similar