STMLMay 23, 2016

Sub-Gaussian estimators of the mean of a random matrix with heavy-tailed entries

arXiv:1605.07129v5114 citations
Originality Highly original
AI Analysis

This addresses robustness issues in statistical applications like PCA and matrix completion for data with outliers, representing a novel method rather than an incremental improvement.

The authors tackled the problem of estimating the mean of a random matrix with heavy-tailed entries, which includes covariance estimation, by developing a new estimator that achieves sub-Gaussian or sub-exponential concentration in operator norm under only finite second moment assumptions.

Estimation of the covariance matrix has attracted a lot of attention of the statistical research community over the years, partially due to important applications such as Principal Component Analysis. However, frequently used empirical covariance estimator (and its modifications) is very sensitive to outliers in the data. As P. J. Huber wrote in 1964, "...This raises a question which could have been asked already by Gauss, but which was, as far as I know, only raised a few years ago (notably by Tukey): what happens if the true distribution deviates slightly from the assumed normal one? As is now well known, the sample mean then may have a catastrophically bad performance..." Motivated by this question, we develop a new estimator of the (element-wise) mean of a random matrix, which includes covariance estimation problem as a special case. Assuming that the entries of a matrix possess only finite second moment, this new estimator admits sub-Gaussian or sub-exponential concentration around the unknown mean in the operator norm. We will explain the key ideas behind our construction, as well as applications to covariance estimation and matrix completion problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes