Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?
This work addresses a practical issue for practitioners running A/B tests in recommendation systems, but it is incremental as it builds on prior work on symbiosis bias.
The paper tackles the problem of selecting the better recommendation algorithm in A/B experiments with data sharing, where interference causes bias in standard estimators, and finds that the sign of the global treatment effect is often sufficient for decision-making, with exploration-exploitation trade-offs determining when this sign is correctly estimated.
We study A/B experiments that are designed to compare the performance of two recommendation algorithms. Prior work has shown that the standard difference-in-means estimator is biased in estimating the global treatment effect (GTE) due to a particular form of interference between experimental units. Specifically, units under the treatment and control algorithms contribute to a shared pool of data that subsequently train both algorithms, resulting in interference between the two groups. The bias arising from this type of data sharing is known as "symbiosis bias". In this paper, we highlight that, for decision-making purposes, the sign of the GTE often matters more than its precise magnitude when selecting the better algorithm. We formalize this insight under a multi-armed bandit framework and theoretically characterize when the sign of the expected GTE estimate under data sharing aligns with or contradicts the sign of the true GTE. Our analysis identifies the level of exploration versus exploitation as a key determinant of how symbiosis bias impacts algorithm selection.