MLLGAPJul 24, 2025

A Two-armed Bandit Framework for A/B Testing

arXiv:2507.18118v12 citationsh-index: 5
Originality Incremental advance
AI Analysis

This work addresses the need for more powerful A/B testing methods in technology companies for policy and product evaluation, representing an incremental improvement over existing approaches.

The paper tackled the problem of improving power in A/B testing for policy evaluation by introducing a two-armed bandit framework, resulting in superior performance compared to existing methods as demonstrated through asymptotic theories, numerical experiments, and real-world ridesharing data.

A/B testing is widely used in modern technology companies for policy evaluation and product deployment, with the goal of comparing the outcomes under a newly-developed policy against a standard control. Various causal inference and reinforcement learning methods developed in the literature are applicable to A/B testing. This paper introduces a two-armed bandit framework designed to improve the power of existing approaches. The proposed procedure consists of three main steps: (i) employing doubly robust estimation to generate pseudo-outcomes, (ii) utilizing a two-armed bandit framework to construct the test statistic, and (iii) applying a permutation-based method to compute the $p$-value. We demonstrate the efficacy of the proposed method through asymptotic theories, numerical experiments and real-world data from a ridesharing company, showing its superior performance in comparison to existing methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes