Risk-Averse Bayes-Adaptive Reinforcement Learning
This work addresses risk management in reinforcement learning for applications requiring safety under uncertainty, though it appears incremental as it builds on existing Bayes-adaptive and CVaR methods.
The paper tackles risk-averse Bayes-adaptive reinforcement learning by optimizing the conditional value at risk (CVaR) of total return in Bayes-adaptive MDPs, showing that policies are risk-averse to parametric and internal uncertainties, and experiments demonstrate significant outperformance over baselines.
In this work, we address risk-averse Bayes-adaptive reinforcement learning. We pose the problem of optimising the conditional value at risk (CVaR) of the total return in Bayes-adaptive Markov decision processes (MDPs). We show that a policy optimising CVaR in this setting is risk-averse to both the parametric uncertainty due to the prior distribution over MDPs, and the internal uncertainty due to the inherent stochasticity of MDPs. We reformulate the problem as a two-player stochastic game and propose an approximate algorithm based on Monte Carlo tree search and Bayesian optimisation. Our experiments demonstrate that our approach significantly outperforms baseline approaches for this problem.