Optimizing the CVaR via Sampling
This work addresses risk management in stochastic optimization, particularly for reinforcement learning applications, though it is incremental as it builds on existing CVaR and gradient methods.
The paper tackled the problem of optimizing Conditional Value at Risk (CVaR) by developing a new gradient formula and a sampling-based estimator, enabling risk-sensitive reinforcement learning in domains like Tetris with proven convergence to local optima.
Conditional Value at Risk (CVaR) is a prominent risk measure that is being used extensively in various domains. We develop a new formula for the gradient of the CVaR in the form of a conditional expectation. Based on this formula, we propose a novel sampling-based estimator for the CVaR gradient, in the spirit of the likelihood-ratio method. We analyze the bias of the estimator, and prove the convergence of a corresponding stochastic gradient descent algorithm to a local CVaR optimum. Our method allows to consider CVaR optimization in new domains. As an example, we consider a reinforcement learning application, and learn a risk-sensitive controller for the game of Tetris.