MELGAug 1, 2024

Early Stopping Based on Repeated Significance

arXiv:2408.00908v11 citationsh-index: 3
Originality Synthesis-oriented
AI Analysis

This work addresses statistical confidence issues in A/B testing for practitioners, though it appears incremental as it builds on existing correction methods like Bonferroni.

The paper tackles the challenge of early stopping in bucket tests with multiple criteria by proposing a method that requires criteria to be successful at multiple decision points, avoiding overly strict p-value requirements.

For a bucket test with a single criterion for success and a fixed number of samples or testing period, requiring a $p$-value less than a specified value of $α$ for the success criterion produces statistical confidence at level $1 - α$. For multiple criteria, a Bonferroni correction that partitions $α$ among the criteria produces statistical confidence, at the cost of requiring lower $p$-values for each criterion. The same concept can be applied to decisions about early stopping, but that can lead to strict requirements for $p$-values. We show how to address that challenge by requiring criteria to be successful at multiple decision points.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes