LGAug 24, 2023

Extreme Risk Mitigation in Reinforcement Learning using Extreme Value Theory

Karthik Somayaji NS, Yu Wang, Malachi Schram, Jan Drgona, Mahantesh Halappanavar, Frank Liu, Peng Li

arXiv:2308.13011v15.35 citationsh-index: 63

Originality Incremental advance

AI Analysis

This work addresses the problem of improving safety for RL agents in real-world applications by mitigating extreme risks, though it appears incremental as it builds on existing risk-aware RL techniques.

The paper tackles the challenge of modeling rare catastrophic events in risk-sensitive reinforcement learning by using extreme value theory to parameterize the extreme values of the state-action value function distribution, and it demonstrates that the proposed method outperforms other risk-averse RL algorithms on diverse benchmark tasks.

Risk-sensitive reinforcement learning (RL) has garnered significant attention in recent years due to the growing interest in deploying RL agents in real-world scenarios. A critical aspect of risk awareness involves modeling highly rare risk events (rewards) that could potentially lead to catastrophic outcomes. These infrequent occurrences present a formidable challenge for data-driven methods aiming to capture such risky events accurately. While risk-aware RL techniques do exist, their level of risk aversion heavily relies on the precision of the state-action value function estimation when modeling these rare occurrences. Our work proposes to enhance the resilience of RL agents when faced with very rare and risky events by focusing on refining the predictions of the extreme values predicted by the state-action value function distribution. To achieve this, we formulate the extreme values of the state-action value function distribution as parameterized distributions, drawing inspiration from the principles of extreme value theory (EVT). This approach effectively addresses the issue of infrequent occurrence by leveraging EVT-based parameterization. Importantly, we theoretically demonstrate the advantages of employing these parameterized distributions in contrast to other risk-averse algorithms. Our evaluations show that the proposed method outperforms other risk averse RL algorithms on a diverse range of benchmark tasks, each encompassing distinct risk scenarios.

View on arXiv PDF

Similar