LG MLMay 7, 2024

Data-driven Error Estimation: Upper Bounding Multiple Errors without Class Complexity as Input

Sanath Kumar Krishnamurthy, Anna Lyubarskaja, Emma Brunskill, Susan Athey

arXiv:2405.04636v32.6h-index: 54

Originality Incremental advance

AI Analysis

This addresses the need for simultaneous confidence intervals in statistical and machine learning tasks, offering a more flexible alternative to existing methods like union bounding, though it appears incremental in improving error estimation techniques.

The paper tackles the problem of constructing high-probability upper bounds on maximum errors for a class of estimates, such as in multiple mean estimation or generalization error bounding, by proposing a data-driven method that does not require class complexity as input, achieving results that adapt to unknown correlation structures.

Constructing confidence intervals that are simultaneously valid across a class of estimates is central for tasks such as multiple mean estimation, bounding generalization error in machine learning, and adaptive experimental design. We frame this as an "error estimation problem," where the goal is to determine a high-probability upper bound on the maximum error for a class of estimates. We propose an entirely data-driven approach that derives such bounds for both finite and infinite class settings, naturally adapting to a potentially unknown correlation structure of random errors. Notably, our method does not require class complexity as an input, overcoming a major limitation of existing approaches such as union bounding and bounds based on Talagrand's inequality. In this paper, we present our simple yet general solution and demonstrate its flexibility through applications ranging from constructing multiple simultaneously valid confidence intervals to optimizing exploration in contextual bandit algorithms.

View on arXiv PDF

Similar