Prospect-Theory Behavior from Bellman Optimality in MDPs with Catastrophic States

arXiv:2606.0097010.1

Predicted impact top 99% in AI · last 90 daysOriginality Highly original

AI Analysis

For researchers in decision theory and reinforcement learning, this paper identifies a structural mechanism (absorbing catastrophic states) that explains prospect-theory-like behavior from standard optimal control, potentially unifying normative and descriptive decision models.

This paper shows that risk-neutral Bellman optimality in MDPs with an absorbing catastrophic state produces prospect-theory-like behaviors (S-shaped value function, loss aversion, reflection effect) without any utility curvature or probability weighting. Across 495 configurations, the optimal policy exhibits risk-aversion near catastrophe in growth regimes and risk-seeking in decline regimes, with a closed-form loss-aversion plateau matching numerical solutions to R²=0.999.

We study risk-neutral control in Markov decision processes with an absorbing catastrophic state. Even though rewards are linear and the agent has no utility curvature, probability weighting, or framing dependence, standard Bellman optimality produces three prospect-theory-like signatures: an S-shaped value-function profile (convex near catastrophe, concave in the far field), an endogenous loss-sensitivity coefficient $λ^*(S) > 1$, and a reflection-effect policy reversal. Across 495 configurations, the optimal policy plays safe near catastrophe in positive-drift (growth) regimes despite the risky action's higher immediate expected value, and plays risky near catastrophe in negative-drift (decline) regimes despite the safe action's lower immediate expected loss. We derive a closed-form expression for the asymptotic loss-aversion plateau $\barλ$ that depends only on win probability $p$, payoff asymmetry $r = |Δ_\ell/Δ_w|$, and discount factor $β$, and matches numerical solutions to $R^2 = 0.999$. The mechanism does not require asymmetric payoffs. Across a sweep of $(p,β)$ at three asymmetry levels, the asymmetry share of $\barλ$ above unity has median 4.6% at $r = 1.25$ and rises to 13.9% at $r = 2$, with the boundary contribution exceeding the asymmetry contribution in every cell tested. The phenomena persist under tabular Q-learning (a model-free agent reproduces $V^*$ at correlation 0.98 in growth and 1.00 in decline) and under stochastic transitions with Gaussian, heavy-tailed Student-$t_3$, and asymmetric skew-normal noise up to 50% of the step size, where the asymptotic plateau tracks the closed-form prediction within 0.41% for safe-channel noise and within 9.6% for risky-channel or both-channel noise. These results identify absorbing failure states as a sufficient structural mechanism for prospect-theory-like behavior under optimal control.

View on arXiv PDF

Similar