LGOCMLMay 2, 2024

Robust Risk-Sensitive Reinforcement Learning with Conditional Value-at-Risk

arXiv:2405.01718v16 citationsh-index: 5ITW
Originality Incremental advance
AI Analysis

This work addresses robustness and risk sensitivity in reinforcement learning for applications with uncertain environments, representing an incremental advancement by extending existing RMDP frameworks to incorporate CVaR and decision-dependent uncertainty.

The paper tackles the problem of robust risk-sensitive reinforcement learning by analyzing Conditional Value-at-Risk (CVaR) under Robust Markov Decision Processes (RMDPs), establishing connections between robustness and risk sensitivity and proposing new algorithms like NCVaR for state-action-dependent ambiguity sets, validated through simulation experiments.

Robust Markov Decision Processes (RMDPs) have received significant research interest, offering an alternative to standard Markov Decision Processes (MDPs) that often assume fixed transition probabilities. RMDPs address this by optimizing for the worst-case scenarios within ambiguity sets. While earlier studies on RMDPs have largely centered on risk-neutral reinforcement learning (RL), with the goal of minimizing expected total discounted costs, in this paper, we analyze the robustness of CVaR-based risk-sensitive RL under RMDP. Firstly, we consider predetermined ambiguity sets. Based on the coherency of CVaR, we establish a connection between robustness and risk sensitivity, thus, techniques in risk-sensitive RL can be adopted to solve the proposed problem. Furthermore, motivated by the existence of decision-dependent uncertainty in real-world problems, we study problems with state-action-dependent ambiguity sets. To solve this, we define a new risk measure named NCVaR and build the equivalence of NCVaR optimization and robust CVaR optimization. We further propose value iteration algorithms and validate our approach in simulation experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes