LGMLFeb 8, 2020

Inference for Batched Bandits

arXiv:2002.03217v3108 citations
AI Analysis

This addresses the need for reliable statistical inference in scientific and industrial applications using bandit algorithms, providing a method to correct for biases in adaptively-collected data.

The paper tackled the problem of unreliable inference from adaptively-collected bandit data by proving that ordinary least squares (OLS) fails asymptotically when there is no unique optimal arm, leading to Type-1 error inflation, and introduced the Batched OLS (BOLS) estimator, which is asymptotically normal and robust to non-stationarity in multi-arm and contextual bandits.

As bandit algorithms are increasingly utilized in scientific studies and industrial applications, there is an associated increasing need for reliable inference methods based on the resulting adaptively-collected data. In this work, we develop methods for inference on data collected in batches using a bandit algorithm. We first prove that the ordinary least squares estimator (OLS), which is asymptotically normal on independently sampled data, is not asymptotically normal on data collected using standard bandit algorithms when there is no unique optimal arm. This asymptotic non-normality result implies that the naive assumption that the OLS estimator is approximately normal can lead to Type-1 error inflation and confidence intervals with below-nominal coverage probabilities. Second, we introduce the Batched OLS estimator (BOLS) that we prove is (1) asymptotically normal on data collected from both multi-arm and contextual bandits and (2) robust to non-stationarity in the baseline reward.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes