ST LG MLFeb 24, 2020

Asymptotic Analysis of Sampling Estimators for Randomized Numerical Linear Algebra Algorithms

Ping Ma, Xinlian Zhang, Xin Xing, Jingyi Ma, Michael W. Mahoney

arXiv:2002.10526v18.072 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of enabling statistical inference (e.g., confidence intervals) for RandNLA algorithms, which is crucial for practitioners in data science and statistics, though it is incremental as it builds on prior point estimator analyses.

The paper tackles the lack of distributional analysis for Randomized Numerical Linear Algebra (RandNLA) sampling estimators in least-squares problems, deriving their asymptotic normality and unbiasedness under mild conditions, and identifies new optimal sampling probabilities like root leverage sampling that improve over existing methods.

The statistical analysis of Randomized Numerical Linear Algebra (RandNLA) algorithms within the past few years has mostly focused on their performance as point estimators. However, this is insufficient for conducting statistical inference, e.g., constructing confidence intervals and hypothesis testing, since the distribution of the estimator is lacking. In this article, we develop an asymptotic analysis to derive the distribution of RandNLA sampling estimators for the least-squares problem. In particular, we derive the asymptotic distribution of a general sampling estimator with arbitrary sampling probabilities. The analysis is conducted in two complementary settings, i.e., when the objective of interest is to approximate the full sample estimator or is to infer the underlying ground truth model parameters. For each setting, we show that the sampling estimator is asymptotically normally distributed under mild regularity conditions. Moreover, the sampling estimator is asymptotically unbiased in both settings. Based on our asymptotic analysis, we use two criteria, the Asymptotic Mean Squared Error (AMSE) and the Expected Asymptotic Mean Squared Error (EAMSE), to identify optimal sampling probabilities. Several of these optimal sampling probability distributions are new to the literature, e.g., the root leverage sampling estimator and the predictor length sampling estimator. Our theoretical results clarify the role of leverage in the sampling process, and our empirical results demonstrate improvements over existing methods.

View on arXiv PDF

Similar