LGMLNov 15, 2018

Tight Bayesian Ambiguity Sets for Robust MDPs

arXiv:1811.06512v19 citations
Originality Incremental advance
AI Analysis

This work addresses robustness in reinforcement learning for uncertain environments, offering a more practical approach to RMDPs, though it is incremental as it builds on existing ambiguity set methods.

The paper tackles the problem of overly conservative solutions in robust Markov Decision Processes (RMDPs) by proposing RSVF, a method that uses Bayesian priors and optimized ambiguity sets to achieve less conservative policies while maintaining worst-case guarantees, with empirical results showing practical improvements.

Robustness is important for sequential decision making in a stochastic dynamic environment with uncertain probabilistic parameters. We address the problem of using robust MDPs (RMDPs) to compute policies with provable worst-case guarantees in reinforcement learning. The quality and robustness of an RMDP solution is determined by its ambiguity set. Existing methods construct ambiguity sets that lead to impractically conservative solutions. In this paper, we propose RSVF, which achieves less conservative solutions with the same worst-case guarantees by 1) leveraging a Bayesian prior, 2) optimizing the size and location of the ambiguity set, and, most importantly, 3) relaxing the requirement that the set is a confidence interval. Our theoretical analysis shows the safety of RSVF, and the empirical results demonstrate its practical promise.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes