STFeb 19, 2020
On conditional versus marginal bias in multi-armed banditsJaehyeok Shin, Aaditya Ramdas, Alessandro Rinaldo
The bias of the sample means of the arms in multi-armed bandits is an important issue in adaptive data analysis that has recently received considerable attention in the literature. Existing results relate in precise ways the sign and magnitude of the bias to various sources of data adaptivity, but do not apply to the conditional inference setting in which the sample means are computed only if some specific conditions are satisfied. In this paper, we characterize the sign of the conditional bias of monotone functions of the rewards, including the sample mean. Our results hold for arbitrary conditioning events and leverage natural monotonicity properties of the data collection policy. We further demonstrate, through several examples from sequential testing and best arm identification, that the sign of the conditional and marginal bias of the sample mean of an arm can be different, depending on the conditioning event. Our analysis offers new and interesting perspectives on the subtleties of assessing the bias in data adaptive settings.
STMay 27, 2019
Are sample means in multi-armed bandits positively or negatively biased?Jaehyeok Shin, Aaditya Ramdas, Alessandro Rinaldo
It is well known that in stochastic multi-armed bandits (MAB), the sample mean of an arm is typically not an unbiased estimator of its true mean. In this paper, we decouple three different sources of this selection bias: adaptive \emph{sampling} of arms, adaptive \emph{stopping} of the experiment, and adaptively \emph{choosing} which arm to study. Through a new notion called ``optimism'' that captures certain natural monotonic behaviors of algorithms, we provide a clean and unified analysis of how optimistic rules affect the sign of the bias. The main takeaway message is that optimistic sampling induces a negative bias, but optimistic stopping and optimistic choosing both induce a positive bias. These results are derived in a general stochastic MAB setup that is entirely agnostic to the final aim of the experiment (regret minimization or best-arm identification or anything else). We provide examples of optimistic rules of each type, demonstrate that simulations confirm our theoretical predictions, and pose some natural but hard open problems.
STFeb 2, 2019
On the bias, risk and consistency of sample means in multi-armed banditsJaehyeok Shin, Aaditya Ramdas, Alessandro Rinaldo
The sample mean is among the most well studied estimators in statistics, having many desirable properties such as unbiasedness and consistency. However, when analyzing data collected using a multi-armed bandit (MAB) experiment, the sample mean is biased and much remains to be understood about its properties. For example, when is it consistent, how large is its bias, and can we bound its mean squared error? This paper delivers a thorough and systematic treatment of the bias, risk and consistency of MAB sample means. Specifically, we identify four distinct sources of selection bias (sampling, stopping, choosing and rewinding) and analyze them both separately and together. We further demonstrate that a new notion of \emph{effective sample size} can be used to bound the risk of the sample mean under suitable loss functions. We present several carefully designed examples to provide intuition on the different sources of selection bias we study. Our treatment is nonparametric and algorithm-agnostic, meaning that it is not tied to a specific algorithm or goal. In a nutshell, our proofs combine variational representations of information-theoretic divergences with new martingale concentration inequalities.