LGFeb 10

Risk-sensitive reinforcement learning using expectiles, shortfall risk and optimized certainty equivalent risk

arXiv:2602.09300v1h-index: 7
Originality Incremental advance
AI Analysis

This provides incremental improvements to risk-sensitive reinforcement learning methods for researchers and practitioners in safe AI and decision-making under uncertainty.

The authors tackled the problem of risk-sensitive reinforcement learning by proposing algorithms for three risk measure families (expectiles, utility-based shortfall risk, and optimized certainty equivalent risk), deriving policy gradient theorems and estimators with O(1/m) mean-squared error bounds where m is the number of trajectories, and validating their approach on RL benchmarks.

We propose risk-sensitive reinforcement learning algorithms catering to three families of risk measures, namely expectiles, utility-based shortfall risk and optimized certainty equivalent risk. For each risk measure, in the context of a finite horizon Markov decision process, we first derive a policy gradient theorem. Second, we propose estimators of the risk-sensitive policy gradient for each of the aforementioned risk measures, and establish $\mathcal{O}\left(1/m\right)$ mean-squared error bounds for our estimators, where $m$ is the number of trajectories. Further, under standard assumptions for policy gradient-type algorithms, we establish smoothness of the risk-sensitive objective, in turn leading to stationary convergence rate bounds for the overall risk-sensitive policy gradient algorithm that we propose. Finally, we conduct numerical experiments to validate the theoretical findings on popular RL benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes