CLLGMLNov 16, 2023

Show Your Work with Confidence: Confidence Bands for Tuning Curves

arXiv:2311.09480v232 citationsh-index: 12Has Code
Originality Highly original
AI Analysis

This provides a robust tool for researchers and practitioners in machine learning to confidently compare model performance, addressing a common issue of tuning ambiguity, though it is incremental as it builds on existing tuning curve concepts.

The paper tackles the problem of ambiguous hyperparameter tuning comparisons in NLP by introducing the first method to construct exact, simultaneous, and distribution-free confidence bands for tuning curves, enabling rigorous method comparisons with empirical validation showing their bands achieve target confidence exactly while baselines fail.

The choice of hyperparameters greatly impacts performance in natural language processing. Often, it is hard to tell if a method is better than another or just better tuned. Tuning curves fix this ambiguity by accounting for tuning effort. Specifically, they plot validation performance as a function of the number of hyperparameter choices tried so far. While several estimators exist for these curves, it is common to use point estimates, which we show fail silently and give contradictory results when given too little data. Beyond point estimates, confidence bands are necessary to rigorously establish the relationship between different approaches. We present the first method to construct valid confidence bands for tuning curves. The bands are exact, simultaneous, and distribution-free, thus they provide a robust basis for comparing methods. Empirical analysis shows that while bootstrap confidence bands, which serve as a baseline, fail to approximate their target confidence, ours achieve it exactly. We validate our design with ablations, analyze the effect of sample size, and provide guidance on comparing models with our method. To promote confident comparisons in future work, we release opda: an easy-to-use library that you can install with pip. https://github.com/nicholaslourie/opda

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes