LG MLFeb 8, 2020

Inference for Batched Bandits

Kelly W. Zhang, Lucas Janson, Susan A. Murphy

arXiv:2002.03217v323.5110 citations

Originality Incremental advance

AI Analysis

This addresses the need for reliable statistical inference in scientific and industrial applications using bandit algorithms, providing a method to correct for biases in adaptively-collected data.

The paper tackled the problem of unreliable inference from adaptively-collected bandit data by proving that ordinary least squares (OLS) fails asymptotically when there is no unique optimal arm, leading to Type-1 error inflation, and introduced the Batched OLS (BOLS) estimator, which is asymptotically normal and robust to non-stationarity in multi-arm and contextual bandits.

As bandit algorithms are increasingly utilized in scientific studies and industrial applications, there is an associated increasing need for reliable inference methods based on the resulting adaptively-collected data. In this work, we develop methods for inference on data collected in batches using a bandit algorithm. We first prove that the ordinary least squares estimator (OLS), which is asymptotically normal on independently sampled data, is not asymptotically normal on data collected using standard bandit algorithms when there is no unique optimal arm. This asymptotic non-normality result implies that the naive assumption that the OLS estimator is approximately normal can lead to Type-1 error inflation and confidence intervals with below-nominal coverage probabilities. Second, we introduce the Batched OLS estimator (BOLS) that we prove is (1) asymptotically normal on data collected from both multi-arm and contextual bandits and (2) robust to non-stationarity in the baseline reward.

View on arXiv PDF

Similar