LGSTMLJun 21, 2019

Guaranteed Validity for Empirical Approaches to Adaptive Data Analysis

arXiv:1906.09231v218 citations
Originality Highly original
AI Analysis

This work addresses the challenge of reliable uncertainty quantification in adaptive data analysis for researchers and practitioners, offering a practical improvement over existing methods.

The paper tackles the problem of providing valid confidence intervals for adaptive statistical queries, introducing a framework that yields instance-specific intervals which are orders of magnitude better than worst-case bounds when paired with effective heuristics.

We design a general framework for answering adaptive statistical queries that focuses on providing explicit confidence intervals along with point estimates. Prior work in this area has either focused on providing tight confidence intervals for specific analyses, or providing general worst-case bounds for point estimates. Unfortunately, as we observe, these worst-case bounds are loose in many settings --- often not even beating simple baselines like sample splitting. Our main contribution is to design a framework for providing valid, instance-specific confidence intervals for point estimates that can be generated by heuristics. When paired with good heuristics, this method gives guarantees that are orders of magnitude better than the best worst-case bounds. We provide a Python library implementing our method.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes