LGCPMFRMTRDec 26, 2021

Reinforcement Learning with Dynamic Convex Risk Measures

arXiv:2112.13414v339 citations
Originality Incremental advance
AI Analysis

This provides a method for risk-aware decision-making in domains like finance and robotics, though it appears incremental as it builds on existing risk measure and RL frameworks.

The paper tackles time-consistent risk-sensitive stochastic optimization problems by developing a reinforcement learning approach using dynamic convex risk measures, resulting in a model-free actor-critic algorithm demonstrated on financial and robotics applications.

We develop an approach for solving time-consistent risk-sensitive stochastic optimization problems using model-free reinforcement learning (RL). Specifically, we assume agents assess the risk of a sequence of random variables using dynamic convex risk measures. We employ a time-consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules that aid in obtaining optimal policies. We further develop an actor-critic style algorithm using neural networks to optimize over policies. Finally, we demonstrate the performance and flexibility of our approach by applying it to three optimization problems: statistical arbitrage trading strategies, financial hedging, and obstacle avoidance robot control.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes