LGMLJun 27, 2012

The Big Data Bootstrap

arXiv:1206.6415v1152 citations
Originality Highly original
AI Analysis

This provides a computationally efficient alternative for statistical inference in big data settings, addressing a bottleneck for researchers and practitioners dealing with large-scale data analysis.

The paper tackles the computational challenge of bootstrap methods on large datasets by introducing the Bag of Little Bootstraps (BLB), a procedure that combines bootstrap and subsampling features to efficiently assess estimator quality while maintaining statistical properties.

The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving large datasets, the computation of bootstrap-based quantities can be prohibitively demanding. As an alternative, we present the Bag of Little Bootstraps (BLB), a new procedure which incorporates features of both the bootstrap and subsampling to obtain a robust, computationally efficient means of assessing estimator quality. BLB is well suited to modern parallel and distributed computing architectures and retains the generic applicability, statistical efficiency, and favorable theoretical properties of the bootstrap. We provide the results of an extensive empirical and theoretical investigation of BLB's behavior, including a study of its statistical correctness, its large-scale implementation and performance, selection of hyperparameters, and performance on real data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes