Regret Bounds for Risk-Sensitive Reinforcement Learning
This work addresses the need for safe and reliable decision-making in domains like healthcare and robotics by providing foundational regret analysis for risk-sensitive RL.
The paper tackled the problem of reinforcement learning with risk-sensitive objectives, such as CVaR, in safety-critical applications, and proved the first regret bounds for this general class, establishing theoretical guarantees.
In safety-critical applications of reinforcement learning such as healthcare and robotics, it is often desirable to optimize risk-sensitive objectives that account for tail outcomes rather than expected reward. We prove the first regret bounds for reinforcement learning under a general class of risk-sensitive objectives including the popular CVaR objective. Our theory is based on a novel characterization of the CVaR objective as well as a novel optimistic MDP construction.