Central-limit approach to risk-aware Markov decision processes
It addresses risk minimization in sequential decision-making for reinforcement learning practitioners, but the contribution is incremental as it extends existing central-limit ideas to risk-aware MDPs.
This paper proposes a central-limit-based approach to minimize risk in Markov decision processes over long time horizons, applicable to both known and unknown transition probabilities. The method includes a gradient-based policy improvement algorithm that converges to a local optimum of the risk objective.
Whereas classical Markov decision processes maximize the expected reward, we consider minimizing the risk. We propose to evaluate the risk associated to a given policy over a long-enough time horizon with the help of a central limit theorem. The proposed approach works whether the transition probabilities are known or not. We also provide a gradient-based policy improvement algorithm that converges to a local optimum of the risk objective.