LGOCMLJun 20, 2020

Entropic Risk Constrained Soft-Robust Policy Optimization

arXiv:2006.11679v16 citations
Originality Synthesis-oriented
AI Analysis

This work addresses risk management in reinforcement learning for high-stakes applications, but appears incremental as it builds on existing entropic risk measures and policy optimization methods.

The paper tackles the problem of managing risk from model uncertainties in reinforcement learning for high-stakes domains by proposing entropic risk constrained policy gradient and actor-critic algorithms, demonstrating their usefulness across several problem domains.

Having a perfect model to compute the optimal policy is often infeasible in reinforcement learning. It is important in high-stakes domains to quantify and manage risk induced by model uncertainties. Entropic risk measure is an exponential utility-based convex risk measure that satisfies many reasonable properties. In this paper, we propose an entropic risk constrained policy gradient and actor-critic algorithms that are risk-averse to the model uncertainty. We demonstrate the usefulness of our algorithms on several problem domains.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes