LGOct 11, 2022

Regret Bounds for Risk-Sensitive Reinforcement Learning

arXiv:2210.05650v126 citationsh-index: 41
Originality Highly original
AI Analysis

This work addresses the need for safe and reliable decision-making in domains like healthcare and robotics by providing foundational regret analysis for risk-sensitive RL.

The paper tackled the problem of reinforcement learning with risk-sensitive objectives, such as CVaR, in safety-critical applications, and proved the first regret bounds for this general class, establishing theoretical guarantees.

In safety-critical applications of reinforcement learning such as healthcare and robotics, it is often desirable to optimize risk-sensitive objectives that account for tail outcomes rather than expected reward. We prove the first regret bounds for reinforcement learning under a general class of risk-sensitive objectives including the popular CVaR objective. Our theory is based on a novel characterization of the CVaR objective as well as a novel optimistic MDP construction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes