LG AI ME MLJun 23, 2020

Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation

Aaron Sonabend-W, Junwei Lu, Leo A. Celi, Tianxi Cai, Peter Szolovits

arXiv:2006.13189v29.627 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the problem of deploying offline RL policies in practical applications by improving interpretability and uncertainty measures, though it appears incremental as it builds on existing methods like PSRL.

The paper tackles the challenge of offline reinforcement learning by proposing an Expert-Supervised RL framework that learns safe and optimal policies with uncertainty quantification, providing theoretical guarantees and sample efficiency independent of risk aversion and behavior policy quality.

Offline Reinforcement Learning (RL) is a promising approach for learning optimal policies in environments where direct exploration is expensive or unfeasible. However, the adoption of such policies in practice is often challenging, as they are hard to interpret within the application context, and lack measures of uncertainty for the learned policy value and its decisions. To overcome these issues, we propose an Expert-Supervised RL (ESRL) framework which uses uncertainty quantification for offline policy learning. In particular, we have three contributions: 1) the method can learn safe and optimal policies through hypothesis testing, 2) ESRL allows for different levels of risk averse implementations tailored to the application context, and finally, 3) we propose a way to interpret ESRL's policy at every state through posterior distributions, and use this framework to compute off-policy value function posteriors. We provide theoretical guarantees for our estimators and regret bounds consistent with Posterior Sampling for RL (PSRL). Sample efficiency of ESRL is independent of the chosen risk aversion threshold and quality of the behavior policy.

View on arXiv PDF Code

Similar