A Survey of Reinforcement Learning For Economics

arXiv:2603.0895627.11 citationsh-index: 2

AI Analysis

It addresses computational challenges in economics by providing a flexible framework for economists, though it is incremental as it adapts existing methods to a specific domain.

This survey introduces reinforcement learning methods to economists to tackle high-dimensional economic models that resist classical reduction techniques, demonstrating their application in pricing, inventory control, and strategic games while noting limitations like brittleness and sample inefficiency.

This survey (re)introduces reinforcement learning methods to economists. The curse of dimensionality limits how far exact dynamic programming can be effectively applied, forcing us to rely on suitably "small" problems or our ability to convert "big" problems into smaller ones. While this reduction has been sufficient for many classical applications, a growing class of economic models resists such reduction. Reinforcement learning algorithms offer a natural, sample-based extension of dynamic programming, extending tractability to problems with high-dimensional states, continuous actions, and strategic interactions. I review the theory connecting classical planning to modern learning algorithms and demonstrate their mechanics through simulated examples in pricing, inventory control, strategic games, and preference elicitation. I also examine the practical vulnerabilities of these algorithms, noting their brittleness, sample inefficiency, sensitivity to hyperparameters, and the absence of global convergence guarantees outside of tabular settings. The successes of reinforcement learning remain strictly bounded by these constraints, as well as a reliance on accurate simulators. When guided by economic structure, reinforcement learning provides a remarkably flexible framework. It stands as an imperfect, but promising, addition to the computational economist's toolkit. A companion survey (Rust and Rawat, 2026b) covers the inverse problem of inferring preferences from observed behavior. All simulation code is publicly available.

View on arXiv PDF

Similar