Confidence Intervals for Algorithmic Leveraging in Linear Regression
This work addresses the need for statistical inference in big data analysis, offering a practical solution for researchers and practitioners using algorithmic leveraging in linear regression.
The paper tackles the problem of constructing reliable confidence intervals for regression coefficients estimated via algorithmic leveraging, a method for analyzing large datasets by sampling. The result is an efficient algorithm that provides finite sample confidence intervals with asymptotic coverage guarantees, which in simulations achieve desired coverage probabilities unlike bootstrap intervals.
The age of big data has produced data sets that are computationally expensive to analyze and store. Algorithmic leveraging proposes that we sample observations from the original data set to generate a representative data set and then perform analysis on the representative data set. In this paper, we present efficient algorithms for constructing finite sample confidence intervals for each algorithmic leveraging estimated regression coefficient, with asymptotic coverage guarantees. In simulations, we confirm empirically that the confidence intervals have the desired coverage probabilities, while bootstrap confidence intervals may not.