ML LGSep 6, 2017

Optimal Sub-sampling with Influence Functions

arXiv:1709.01716v17.632 citations

Originality Incremental advance

AI Analysis

This work addresses computational efficiency challenges in data analysis for researchers and practitioners, though it appears incremental as it builds on existing sub-sampling concepts.

The paper tackles the problem of selecting non-uniform subsamples from large datasets for statistical models by introducing an optimal sampling procedure based on influence functions, demonstrating improved performance over previous methods in linear regression.

Sub-sampling is a common and often effective method to deal with the computational challenges of large datasets. However, for most statistical models, there is no well-motivated approach for drawing a non-uniform subsample. We show that the concept of an asymptotically linear estimator and the associated influence function leads to optimal sampling procedures for a wide class of popular models. Furthermore, for linear regression models which have well-studied procedures for non-uniform sub-sampling, we show our optimal influence function based method outperforms previous approaches. We empirically show the improved performance of our method on real datasets.

View on arXiv PDF

Similar