OCSYSYDec 2, 2015

Central-limit approach to risk-aware Markov decision processes

arXiv:1512.005832 citationsh-index: 42
Originality Synthesis-oriented
AI Analysis

It addresses risk minimization in sequential decision-making for reinforcement learning practitioners, but the contribution is incremental as it extends existing central-limit ideas to risk-aware MDPs.

This paper proposes a central-limit-based approach to minimize risk in Markov decision processes over long time horizons, applicable to both known and unknown transition probabilities. The method includes a gradient-based policy improvement algorithm that converges to a local optimum of the risk objective.

Whereas classical Markov decision processes maximize the expected reward, we consider minimizing the risk. We propose to evaluate the risk associated to a given policy over a long-enough time horizon with the help of a central limit theorem. The proposed approach works whether the transition probabilities are known or not. We also provide a gradient-based policy improvement algorithm that converges to a local optimum of the risk objective.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes