MEAPCOMLJun 2, 2020

Hyperparameter Selection for Subsampling Bootstraps

arXiv:2006.01786v2
AI Analysis

This work addresses a key bottleneck in subsampling methods for statisticians and data analysts working with massive datasets, offering an incremental improvement in hyperparameter tuning.

The paper tackles the problem of hyperparameter selection for subsampling bootstraps like BLB, which is crucial for estimator quality in massive data analysis, and develops a methodology that theoretically identifies an optimal set of hyperparameters, leading to improved statistical efficiency without extra computational cost.

Massive data analysis becomes increasingly prevalent, subsampling methods like BLB (Bag of Little Bootstraps) serves as powerful tools for assessing the quality of estimators for massive data. However, the performance of the subsampling methods are highly influenced by the selection of tuning parameters ( e.g., the subset size, number of resamples per subset ). In this article we develop a hyperparameter selection methodology, which can be used to select tuning parameters for subsampling methods. Specifically, by a careful theoretical analysis, we find an analytically simple and elegant relationship between the asymptotic efficiency of various subsampling estimators and their hyperparameters. This leads to an optimal choice of the hyperparameters. More specifically, for an arbitrarily specified hyperparameter set, we can improve it to be a new set of hyperparameters with no extra CPU time cost, but the resulting estimator's statistical efficiency can be much improved. Both simulation studies and real data analysis demonstrate the superior advantage of our method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes