AINov 15, 2017

Quantile Markov Decision Process

Xiaocheng Li, Huaiyang Zhong, Margaret L. Brandeau

arXiv:1711.05788v54.42 citations

Originality Incremental advance

AI Analysis

This addresses decision-making under risk for applications like healthcare, where optimizing specific reward quantiles (e.g., worst-case scenarios) is more relevant than averages, though it is incremental as it adapts existing MDP frameworks to a quantile objective.

The paper tackles the problem of optimizing quantiles of cumulative rewards in Markov decision processes (MDPs), rather than the expectation, by introducing a quantile MDP (QMDP) model and providing analytical results and a dynamic programming algorithm for optimal policies, with an application to HIV treatment initiation.

The goal of a traditional Markov decision process (MDP) is to maximize expected cumulative reward over a defined horizon (possibly infinite). In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward instead of its expectation. In this paper we consider the problem of optimizing the quantiles of the cumulative rewards of a Markov decision process (MDP), which we refer to as a quantile Markov decision process (QMDP). We provide analytical results characterizing the optimal QMDP value function and present a dynamic programming-based algorithm to solve for the optimal policy. The algorithm also extends to the MDP problem with a conditional value-at-risk (CVaR) objective. We illustrate the practical relevance of our model by evaluating it on an HIV treatment initiation problem, where patients aim to balance the potential benefits and risks of the treatment.

View on arXiv PDF

Similar