LGFeb 12, 2021

Scalable Bayesian Inverse Reinforcement Learning

arXiv:2102.06483v224.084 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the problem of applying Bayesian inference to inverse reinforcement learning in high-stakes domains like healthcare, where offline and scalable methods are needed, though it is incremental in improving upon existing techniques.

The paper tackled the scalability and offline learning challenges in Bayesian inverse reinforcement learning by introducing AVRIL, a method that jointly learns an approximate posterior over rewards and a policy without environment interaction, achieving competitive task performance on medical data and control simulations.

Bayesian inference over the reward presents an ideal solution to the ill-posed nature of the inverse reinforcement learning problem. Unfortunately current methods generally do not scale well beyond the small tabular setting due to the need for an inner-loop MDP solver, and even non-Bayesian methods that do themselves scale often require extensive interaction with the environment to perform well, being inappropriate for high stakes or costly applications such as healthcare. In this paper we introduce our method, Approximate Variational Reward Imitation Learning (AVRIL), that addresses both of these issues by jointly learning an approximate posterior distribution over the reward that scales to arbitrarily complicated state spaces alongside an appropriate policy in a completely offline manner through a variational approach to said latent reward. Applying our method to real medical data alongside classic control simulations, we demonstrate Bayesian reward inference in environments beyond the scope of current methods, as well as task performance competitive with focused offline imitation learning algorithms.

View on arXiv PDF Code

Similar