MLAILGApr 15, 2014

Optimizing the CVaR via Sampling

arXiv:1404.3862v4222 citations
Originality Incremental advance
AI Analysis

This work addresses risk management in stochastic optimization, particularly for reinforcement learning applications, though it is incremental as it builds on existing CVaR and gradient methods.

The paper tackled the problem of optimizing Conditional Value at Risk (CVaR) by developing a new gradient formula and a sampling-based estimator, enabling risk-sensitive reinforcement learning in domains like Tetris with proven convergence to local optima.

Conditional Value at Risk (CVaR) is a prominent risk measure that is being used extensively in various domains. We develop a new formula for the gradient of the CVaR in the form of a conditional expectation. Based on this formula, we propose a novel sampling-based estimator for the CVaR gradient, in the spirit of the likelihood-ratio method. We analyze the bias of the estimator, and prove the convergence of a corresponding stochastic gradient descent algorithm to a local CVaR optimum. Our method allows to consider CVaR optimization in new domains. As an example, we consider a reinforcement learning application, and learn a risk-sensitive controller for the game of Tetris.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes