LGCYOCMLJun 27, 2012

Policy Gradients with Variance Related Risk Criteria

arXiv:1206.6404v1237 citations
Originality Highly original
AI Analysis

This work addresses risk management in dynamic decision-making for fields like finance, offering a novel algorithmic approach to a challenging problem.

The paper tackles the NP-hard problem of optimizing variance-related risk criteria in reinforcement learning by developing a policy gradient framework that incorporates both expected cost and variance, and demonstrates its convergence and applicability in portfolio planning.

Managing risk in dynamic decision problems is of cardinal importance in many fields such as finance and process control. The most common approach to defining risk is through various variance related criteria such as the Sharpe Ratio or the standard deviation adjusted reward. It is known that optimizing many of the variance related risk criteria is NP-hard. In this paper we devise a framework for local policy gradient style algorithms for reinforcement learning for variance related criteria. Our starting point is a new formula for the variance of the cost-to-go in episodic tasks. Using this formula we develop policy gradient algorithms for criteria that involve both the expected cost and the variance of the cost. We prove the convergence of these algorithms to local minima and demonstrate their applicability in a portfolio planning problem.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes