OC AIMay 11, 2018

Stochastic Approximation for Risk-aware Markov Decision Processes

arXiv:1805.04238v413.822 citations

Originality Incremental advance

AI Analysis

This work addresses risk-sensitive decision-making in reinforcement learning, which is incremental as it extends existing methods to incorporate various risk measures with proven convergence guarantees.

The authors tackled the problem of solving risk-aware Markov decision processes by developing a stochastic approximation algorithm with two loops for computing risk and learning optimal policies, achieving a convergence rate of Ω((ln(1/δε)/ε²)^{1/k} + (ln(1/ε))^{1/(1-k)}) with probability at least 1-δ for error tolerance ε.

We develop a stochastic approximation-type algorithm to solve finite state/action, infinite-horizon, risk-aware Markov decision processes. Our algorithm has two loops. The inner loop computes the risk by solving a stochastic saddle-point problem. The outer loop performs $Q$-learning to compute an optimal risk-aware policy. Several widely investigated risk measures (e.g. conditional value-at-risk, optimized certainty equivalent, and absolute semi-deviation) are covered by our algorithm. Almost sure convergence and the convergence rate of the algorithm are established. For an error tolerance $ε>0$ for the optimal $Q$-value estimation gap and learning rate $k\in(1/2,\,1]$, the overall convergence rate of our algorithm is $Ω((\ln(1/δε)/ε^{2})^{1/k}+(\ln(1/ε))^{1/(1-k)})$ with probability at least $1-δ$.

View on arXiv PDF

Similar