LGJun 20, 2020
Entropic Risk Constrained Soft-Robust Policy OptimizationReazul Hasan Russel, Bahram Behzadian, Marek Petrik
Having a perfect model to compute the optimal policy is often infeasible in reinforcement learning. It is important in high-stakes domains to quantify and manage risk induced by model uncertainties. Entropic risk measure is an exponential utility-based convex risk measure that satisfies many reasonable properties. In this paper, we propose an entropic risk constrained policy gradient and actor-critic algorithms that are risk-averse to the model uncertainty. We demonstrate the usefulness of our algorithms on several problem domains.
LGDec 4, 2019
Optimizing Norm-Bounded Weighted Ambiguity Sets for Robust MDPsReazul Hasan Russel, Bahram Behzadian, Marek Petrik
Optimal policies in Markov decision processes (MDPs) are very sensitive to model misspecification. This raises serious concerns about deploying them in high-stake domains. Robust MDPs (RMDP) provide a promising framework to mitigate vulnerabilities by computing policies with worst-case guarantees in reinforcement learning. The solution quality of an RMDP depends on the ambiguity set, which is a quantification of model uncertainties. In this paper, we propose a new approach for optimizing the shape of the ambiguity sets for RMDPs. Our method departs from the conventional idea of constructing a norm-bounded uniform and symmetric ambiguity set. We instead argue that the structure of a near-optimal ambiguity set is problem specific. Our proposed method computes a weight parameter from the value functions, and these weights then drive the shape of the ambiguity sets. Our theoretical analysis demonstrates the rationale of the proposed idea. We apply our method to several different problem domains, and the empirical results further furnish the practical promise of weighted near-optimal ambiguity sets.
LGOct 23, 2019
Optimizing Percentile Criterion Using Robust MDPsBahram Behzadian, Reazul Hasan Russel, Marek Petrik et al.
We address the problem of computing reliable policies in reinforcement learning problems with limited data. In particular, we compute policies that achieve good returns with high confidence when deployed. This objective, known as the \emph{percentile criterion}, can be optimized using Robust MDPs~(RMDPs). RMDPs generalize MDPs to allow for uncertain transition probabilities chosen adversarially from given ambiguity sets. We show that the RMDP solution's sub-optimality depends on the spans of the ambiguity sets along the value function. We then propose new algorithms that minimize the span of ambiguity sets defined by weighted $L_1$ and $L_\infty$ norms. Our primary focus is on Bayesian guarantees, but we also describe how our methods apply to frequentist guarantees and derive new concentration inequalities for weighted $L_1$ and $L_\infty$ norms. Experimental results indicate that our optimized ambiguity sets improve significantly on prior construction methods.
ROApr 2, 2015
Monte Carlo Localization in Hand-Drawn MapsBahram Behzadian, Pratik Agarwal, Wolfram Burgard et al.
Robot localization is a one of the most important problems in robotics. Most of the existing approaches assume that the map of the environment is available beforehand and focus on accurate metrical localization. In this paper, we address the localization problem when the map of the environment is not present beforehand, and the robot relies on a hand-drawn map from a non-expert user. We addressed this problem by expressing the robot pose in the pixel coordinate and simultaneously estimate a local deformation of the hand-drawn map. Experiments show that we are able to localize the robot in the correct room with a robustness up to 80%