STLGMLFeb 2, 2019

On the bias, risk and consistency of sample means in multi-armed bandits

arXiv:1902.00746v341 citations
AI Analysis

This work addresses a fundamental issue in bandit experiments for researchers and practitioners, providing a nonparametric and algorithm-agnostic analysis, but it is incremental as it builds on existing statistical concepts.

The paper tackles the problem of understanding the bias, risk, and consistency of sample means in multi-armed bandit experiments, where these estimators are biased, and it identifies four sources of selection bias and bounds the risk using a new notion of effective sample size.

The sample mean is among the most well studied estimators in statistics, having many desirable properties such as unbiasedness and consistency. However, when analyzing data collected using a multi-armed bandit (MAB) experiment, the sample mean is biased and much remains to be understood about its properties. For example, when is it consistent, how large is its bias, and can we bound its mean squared error? This paper delivers a thorough and systematic treatment of the bias, risk and consistency of MAB sample means. Specifically, we identify four distinct sources of selection bias (sampling, stopping, choosing and rewinding) and analyze them both separately and together. We further demonstrate that a new notion of \emph{effective sample size} can be used to bound the risk of the sample mean under suitable loss functions. We present several carefully designed examples to provide intuition on the different sources of selection bias we study. Our treatment is nonparametric and algorithm-agnostic, meaning that it is not tied to a specific algorithm or goal. In a nutshell, our proofs combine variational representations of information-theoretic divergences with new martingale concentration inequalities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes