SP LG SYOct 4, 2019

A Rademacher Complexity Based Method fo rControlling Power and Confidence Level in Adaptive Statistical Analysis

arXiv:1910.03493v12.39 citations

Originality Incremental advance

AI Analysis

This addresses the issue of reliable inference in iterative data analysis for researchers and practitioners, though it is incremental as it adapts existing generalization bounds to dependent tests.

The paper tackles the problem of controlling generalization error in adaptive statistical analysis where the same holdout data is reused for multiple hypothesis tests, and presents RadaBound, a method based on Rademacher Complexity that demonstrates statistical power and practicality through simulations.

While standard statistical inference techniques and machine learning generalization bounds assume that tests are run on data selected independently of the hypotheses, practical data analysis and machine learning are usually iterative and adaptive processes where the same holdout data is often used for testing a sequence of hypotheses (or models), which may each depend on the outcome of the previous tests on the same data. In this work, we present RadaBound a rigorous, efficient and practical procedure for controlling the generalization error when using a holdout sample for multiple adaptive testing. Our solution is based on a new application of the Rademacher Complexity generalization bounds, adapted to dependent tests. We demonstrate the statistical power and practicality of our method through extensive simulations and comparisons to alternative approaches.

View on arXiv PDF

Similar