LG RMJun 21, 2024

Catastrophic-risk-aware reinforcement learning with extreme-value-theory-based policy gradients

Parisa Davar, Frédéric Godin, Jose Garrido

arXiv:2406.15612v26.44 citationsHas Code

Originality Incremental advance

AI Analysis

It addresses catastrophic risk management for domains like finance, though it appears incremental as it adapts existing methods to a specific risk type.

The paper tackles mitigating catastrophic risk in sequential decision-making by developing a policy gradient algorithm (POTPG) based on extreme value theory, which outperforms benchmarks using empirical distributions in numerical experiments and is applied to financial option hedging.

This paper tackles the problem of mitigating catastrophic risk (which is risk with very low frequency but very high severity) in the context of a sequential decision making process. This problem is particularly challenging due to the scarcity of observations in the far tail of the distribution of cumulative costs (negative rewards). A policy gradient algorithm is developed, that we call POTPG. It is based on approximations of the tail risk derived from extreme value theory. Numerical experiments highlight the out-performance of our method over common benchmarks, relying on the empirical distribution. An application to financial risk management, more precisely to the dynamic hedging of a financial option, is presented.

View on arXiv PDF Code

Similar