LGOCMar 22, 2025

Planning and Learning in Average Risk-aware MDPs

arXiv:2503.17629v22 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses the need for risk-aware decision-making in continuing tasks for agents in fields like finance or robotics, though it is incremental as it builds on existing risk-neutral methods.

The authors tackled the problem of extending risk-neutral algorithms in average cost Markov decision processes to accommodate dynamic risk measures, proposing a relative value iteration algorithm and two model-free Q-learning algorithms, with numerical experiments confirming convergence and enabling finely tuned risk-aware policies.

For continuing tasks, average cost Markov decision processes have well-documented value and can be solved using efficient algorithms. However, it explicitly assumes that the agent is risk-neutral. In this work, we extend risk-neutral algorithms to accommodate the more general class of dynamic risk measures. Specifically, we propose a relative value iteration (RVI) algorithm for planning and design two model-free Q-learning algorithms, namely a generic algorithm based on the multi-level Monte Carlo (MLMC) method, and an off-policy algorithm dedicated to utility-based shortfall risk measures. Both the RVI and MLMC-based Q-learning algorithms are proven to converge to optimality. Numerical experiments validate our analysis, confirm empirically the convergence of the off-policy algorithm, and demonstrate that our approach enables the identification of policies that are finely tuned to the intricate risk-awareness of the agent that they serve.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes