A/B/n Testing with Control in the Presence of Subpopulations
This addresses the need for faster and more reliable decision-making in online experiments like A/B testing, particularly for platforms with diverse user groups, though it is incremental as it builds on existing bandit and testing frameworks.
The paper tackles the problem of efficiently identifying which arms outperform a control in A/B/n testing with stratified subpopulations, proposing a sequential strategy that is asymptotically optimal with expected stopping time growing linearly at the optimal rate of log(1/δ).
Motivated by A/B/n testing applications, we consider a finite set of distributions (called \emph{arms}), one of which is treated as a \emph{control}. We assume that the population is stratified into homogeneous subpopulations. At every time step, a subpopulation is sampled and an arm is chosen: the resulting observation is an independent draw from the arm conditioned on the subpopulation. The quality of each arm is assessed through a weighted combination of its subpopulation means. We propose a strategy for sequentially choosing one arm per time step so as to discover as fast as possible which arms, if any, have higher weighted expectation than the control. This strategy is shown to be asymptotically optimal in the following sense: if $τ_δ$ is the first time when the strategy ensures that it is able to output the correct answer with probability at least $1-δ$, then $\mathbb{E}[τ_δ]$ grows linearly with $\log(1/δ)$ at the exact optimal rate. This rate is identified in the paper in three different settings: (1) when the experimenter does not observe the subpopulation information, (2) when the subpopulation of each sample is observed but not chosen, and (3) when the experimenter can select the subpopulation from which each response is sampled. We illustrate the efficiency of the proposed strategy with numerical simulations on synthetic and real data collected from an A/B/n experiment.