A Concentration of Measure and Random Matrix Approach to Large Dimensional Robust Statistics
This work addresses robust statistical estimation for large-scale data, which is incremental as it builds on existing methods with theoretical guarantees.
The paper tackles robust covariance matrix estimation for high-dimensional data with large perturbations, proving the existence and uniqueness of the estimator and evaluating its limiting spectral distribution using concentration of measure and random matrix theory.
This article studies the \emph{robust covariance matrix estimation} of a data collection $X = (x_1,\ldots,x_n)$ with $x_i = \sqrt τ_i z_i + m$, where $z_i \in \mathbb R^p$ is a \textit{concentrated vector} (e.g., an elliptical random vector), $m\in \mathbb R^p$ a deterministic signal and $τ_i\in \mathbb R$ a scalar perturbation of possibly large amplitude, under the assumption where both $n$ and $p$ are large. This estimator is defined as the fixed point of a function which we show is contracting for a so-called \textit{stable semi-metric}. We exploit this semi-metric along with concentration of measure arguments to prove the existence and uniqueness of the robust estimator as well as evaluate its limiting spectral distribution.