LG MLFeb 21, 2023

Minimax-Bayes Reinforcement Learning

Thomas Kleine Buening, Christos Dimitrakakis, Hannes Eriksson, Divya Grover, Emilio Jorge

arXiv:2302.10831v110.78 citationsh-index: 28Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of decision-making under uncertainty in reinforcement learning, offering incremental improvements in policy robustness.

The paper tackles the problem of selecting appropriate prior distributions in Bayesian reinforcement learning by studying minimax-Bayes solutions, finding that worst-case priors vary by setting but lead to more robust policies compared to standard uniform priors.

While the Bayesian decision-theoretic framework offers an elegant solution to the problem of decision making under uncertainty, one question is how to appropriately select the prior distribution. One idea is to employ a worst-case prior. However, this is not as easy to specify in sequential decision making as in simple statistical estimation problems. This paper studies (sometimes approximate) minimax-Bayes solutions for various reinforcement learning problems to gain insights into the properties of the corresponding priors and policies. We find that while the worst-case prior depends on the setting, the corresponding minimax policies are more robust than those that assume a standard (i.e. uniform) prior.

View on arXiv PDF Code

Similar