LG AIMar 13, 2023

Kernel Density Bayesian Inverse Reinforcement Learning

Aishwarya Mandyam, Didong Li, Jiayu Yao, Diana Cai, Andrew Jones, Barbara E. Engelhardt

arXiv:2303.06827v43.81 citationsh-index: 4Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of applying Bayesian IRL in domains like clinical data where demonstration data is scarce, offering a principled framework for leveraging existing data to enhance inference.

The paper tackles the problem of Bayesian inverse reinforcement learning (IRL) requiring large demonstration datasets by incorporating domain-specific training task data to improve reward function inference with limited expert demonstrations, achieving faster posterior concentration rates, particularly in low-data regimes, and providing the first theoretical guarantees for posterior concentration in Bayesian IRL.

Inverse reinforcement learning (IRL) methods infer an agent's reward function using demonstrations of expert behavior. A Bayesian IRL approach models a distribution over candidate reward functions, capturing a degree of uncertainty in the inferred reward function. This is critical in some applications, such as those involving clinical data. Typically, Bayesian IRL algorithms require large demonstration datasets, which may not be available in practice. In this work, we incorporate existing domain-specific data to achieve better posterior concentration rates. We study a common setting in clinical and biological applications where we have access to expert demonstrations and known reward functions for a set of training tasks. Our aim is to learn the reward function of a new test task given limited expert demonstrations. Existing Bayesian IRL methods impose restrictions on the form of input data, thus limiting the incorporation of training task data. To better leverage information from training tasks, we introduce kernel density Bayesian inverse reinforcement learning (KD-BIRL). Our approach employs a conditional kernel density estimator, which uses the known reward functions of the training tasks to improve the likelihood estimation across a range of reward functions and demonstration samples. Our empirical results highlight KD-BIRL's faster concentration rate in comparison to baselines, particularly in low test task expert demonstration data regimes. Additionally, we are the first to provide theoretical guarantees of posterior concentration for a Bayesian IRL algorithm. Taken together, this work introduces a principled and theoretically grounded framework that enables Bayesian IRL to be applied across a variety of domains.

View on arXiv PDF Code

Similar