AI SYOct 25, 2021

Planning for Risk-Aversion and Expected Value in MDPs

Marc Rigter, Paul Duckworth, Bruno Lacerda, Nick Hawes

arXiv:2110.12746v213.013 citations

Originality Incremental advance

AI Analysis

This work addresses the trade-off between risk and expected value in MDP planning, which is important for applications requiring robust decision-making, but it is incremental as it builds on existing risk-averse methods.

The paper tackles the problem of balancing risk-aversion and expected performance in Markov decision processes (MDPs) by proposing a lexicographic approach that minimizes expected cost while ensuring optimal conditional value at risk (CVaR), and demonstrates improved expected cost compared to state-of-the-art algorithms in four domains.

Planning in Markov decision processes (MDPs) typically optimises the expected cost. However, optimising the expectation does not consider the risk that for any given run of the MDP, the total cost received may be unacceptably high. An alternative approach is to find a policy which optimises a risk-averse objective such as conditional value at risk (CVaR). However, optimising the CVaR alone may result in poor performance in expectation. In this work, we begin by showing that there can be multiple policies which obtain the optimal CVaR. This motivates us to propose a lexicographic approach which minimises the expected cost subject to the constraint that the CVaR of the total cost is optimal. We present an algorithm for this problem and evaluate our approach on four domains. Our results demonstrate that our lexicographic approach improves the expected cost compared to the state of the art algorithm, while achieving the optimal CVaR.

View on arXiv PDF

Similar