ML AI LGApr 15, 2014

Optimizing the CVaR via Sampling

Aviv Tamar, Yonatan Glassner, Shie Mannor

arXiv:1404.3862v4222 citations

Originality Incremental advance

AI Analysis

This work addresses risk management in stochastic optimization, particularly for reinforcement learning applications, though it is incremental as it builds on existing CVaR and gradient methods.

The paper tackled the problem of optimizing Conditional Value at Risk (CVaR) by developing a new gradient formula and a sampling-based estimator, enabling risk-sensitive reinforcement learning in domains like Tetris with proven convergence to local optima.

Conditional Value at Risk (CVaR) is a prominent risk measure that is being used extensively in various domains. We develop a new formula for the gradient of the CVaR in the form of a conditional expectation. Based on this formula, we propose a novel sampling-based estimator for the CVaR gradient, in the spirit of the likelihood-ratio method. We analyze the bias of the estimator, and prove the convergence of a corresponding stochastic gradient descent algorithm to a local CVaR optimum. Our method allows to consider CVaR optimization in new domains. As an example, we consider a reinforcement learning application, and learn a risk-sensitive controller for the game of Tetris.

View on arXiv PDF

Similar