LGMay 8

Actor-Critic Algorithm for Dynamic Expectile and CVaR

arXiv:2605.0785728.1

Predicted impact top 60% in LG · last 90 daysOriginality Incremental advance

AI Analysis

It provides a model-free solution for dynamic risk optimization in reinforcement learning, addressing a known bottleneck in risk-sensitive policy optimization.

The paper proposes a model-free off-policy actor-critic algorithm for dynamic expectile and CVaR optimization, using a surrogate policy gradient without transition perturbation and elicitable value learning. Empirical results show it outperforms existing methods in risk-averse tasks.

Optimizing dynamic risk with stochastic policies is challenging in both policy updates and value learning. The former typically requires transition perturbation, while the latter may rely on model-based approaches. To address these challenges, we propose a surrogate policy gradient without transition perturbation under softmax policy parameterization. We further develop model-free value learning methods for dynamic expectile and conditional value-at-risk by leveraging elicitability. Finally, inspired by Expected SARSA and Expected Policy Gradient, a model-free off-policy actor-critic algorithm is constructed. Empirical results in domains with verifiable risk-averse behavior show that our algorithm can learn risk-averse policy and consistently outperforms other existing methods.

View on arXiv PDF

Similar