Generic E-Variables for Exact Sequential k-Sample Tests that allow for Optional Stopping
This addresses the need for robust sequential testing methods in statistics, particularly for scenarios involving optional stopping, offering a practical solution for researchers and practitioners, though it is incremental as it builds on existing E-variable theory.
The paper tackles the problem of testing whether multiple data streams originate from the same source, developing E-variables that provide exact, nonasymptotic tests with type-I error guarantees under flexible sampling like optional stopping. The result shows that these E-variables often allow for early stopping while maintaining power similar to classical methods, as demonstrated in simulations and a real-world example.
We develop E-variables for testing whether two or more data streams come from the same source or not, and more generally, whether the difference between the sources is larger than some minimal effect size. These E-variables lead to exact, nonasymptotic tests that remain safe, i.e. keep their type-I error guarantees, under flexible sampling scenarios such as optional stopping and continuation. In special cases our E-variables also have an optimal 'growth' property under the alternative. While the construction is generic, we illustrate it through the special case of k x 2 contingency tables, where we also allow for the incorporation of different restrictions on a composite alternative. Comparison to p-value analysis in simulations and a real-world example show that E-variables, through their flexibility, often allow for early stopping of data collection, thereby retaining similar power as classical methods, while also retaining the option of extending or combining data afterwards.