MELGAug 5, 2024

Steady Continuous Monitoring is (Just Barely) Impossible for Tests of Unbounded Length

arXiv:2408.02821v2h-index: 3
Originality Incremental advance
AI Analysis

This addresses a fundamental statistical challenge for practitioners in online experimentation, such as tech companies, by providing a theoretical limitation and a practical solution, though it is incremental in refining existing methods.

The paper tackles the conflict between early stopping and maintaining statistical power in continuous monitoring of A/B tests with unbounded length, showing that maintaining a constant significance requirement is impossible but can be approximated arbitrarily closely using tests that require repeated significant results.

AB testing evaluates the difference between a control and a treatment in a statistically rigorous manner. Continuous monitoring allows statistical evaluation of an AB test as it proceeds. One goal of continuous monitoring is early stopping -- confirming a statistically significant difference between control and treatment as soon as possible. Another goal is to maintain some statistical capability to discover significant differences later in the test if they cannot be confirmed earlier. These goals are in conflict -- looser requirements for early stopping leave us with more stringent ones for later. This paper shows that it is impossible to maintain a constant requirement for significance for tests that have no a priori stopping time, but we can come arbitrarily close to that goal by using tests that require repeated significant results to con rm statistically significant differences between treatment and control.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes