OC SY SYDec 2, 2015

Central-limit approach to risk-aware Markov decision processes

arXiv:1512.005832 citationsh-index: 42

Originality Synthesis-oriented

AI Analysis

It addresses risk minimization in sequential decision-making for reinforcement learning practitioners, but the contribution is incremental as it extends existing central-limit ideas to risk-aware MDPs.

This paper proposes a central-limit-based approach to minimize risk in Markov decision processes over long time horizons, applicable to both known and unknown transition probabilities. The method includes a gradient-based policy improvement algorithm that converges to a local optimum of the risk objective.

Whereas classical Markov decision processes maximize the expected reward, we consider minimizing the risk. We propose to evaluate the risk associated to a given policy over a long-enough time horizon with the help of a central limit theorem. The proposed approach works whether the transition probabilities are known or not. We also provide a gradient-based policy improvement algorithm that converges to a local optimum of the risk objective.

View on arXiv PDF

Similar