Bayesian Inference of Contextual Bandit Policies via Empirical Likelihood
This work addresses the need for reliable uncertainty quantification in policy evaluation for contextual bandits, which is incremental as it builds on existing inference methods by introducing empirical likelihood for improved robustness in small samples.
The paper tackled the problem of policy inference in contextual bandits by developing a Bayesian method using empirical likelihood, which provides accurate uncertainty measurements and robust performance in finite sample regimes, as demonstrated through Monte Carlo simulations and an application to adolescent BMI data.
Policy inference plays an essential role in the contextual bandit problem. In this paper, we use empirical likelihood to develop a Bayesian inference method for the joint analysis of multiple contextual bandit policies in finite sample regimes. The proposed inference method is robust to small sample sizes and is able to provide accurate uncertainty measurements for policy value evaluation. In addition, it allows for flexible inferences on policy comparison with full uncertainty quantification. We demonstrate the effectiveness of the proposed inference method using Monte Carlo simulations and its application to an adolescent body mass index data set.