LGRMJun 21, 2024

Catastrophic-risk-aware reinforcement learning with extreme-value-theory-based policy gradients

arXiv:2406.15612v24 citations
Originality Incremental advance
AI Analysis

It addresses catastrophic risk management for domains like finance, though it appears incremental as it adapts existing methods to a specific risk type.

The paper tackles mitigating catastrophic risk in sequential decision-making by developing a policy gradient algorithm (POTPG) based on extreme value theory, which outperforms benchmarks using empirical distributions in numerical experiments and is applied to financial option hedging.

This paper tackles the problem of mitigating catastrophic risk (which is risk with very low frequency but very high severity) in the context of a sequential decision making process. This problem is particularly challenging due to the scarcity of observations in the far tail of the distribution of cumulative costs (negative rewards). A policy gradient algorithm is developed, that we call POTPG. It is based on approximations of the tail risk derived from extreme value theory. Numerical experiments highlight the out-performance of our method over common benchmarks, relying on the empirical distribution. An application to financial risk management, more precisely to the dynamic hedging of a financial option, is presented.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes