STLGApr 5, 2022

Nearly minimax robust estimator of the mean vector by iterative spectral dimension reduction

arXiv:2204.02323v12 citationsh-index: 28
Originality Highly original
AI Analysis

This provides a robust estimator for statistical inference in high-dimensional data with adversarial corruptions, offering theoretical guarantees and computational efficiency.

The paper tackles robust estimation of the mean vector for sub-Gaussian distributions by introducing a spectral dimension reduction estimator, achieving a minimax-optimal error bound up to a logarithmic factor and a breakdown point of 1/2, with squared error order (r_Σ/n + ε²log(1/ε))log p and runtime order p³ + n p².

We study the problem of robust estimation of the mean vector of a sub-Gaussian distribution. We introduce an estimator based on spectral dimension reduction (SDR) and establish a finite sample upper bound on its error that is minimax-optimal up to a logarithmic factor. Furthermore, we prove that the breakdown point of the SDR estimator is equal to $1/2$, the highest possible value of the breakdown point. In addition, the SDR estimator is equivariant by similarity transforms and has low computational complexity. More precisely, in the case of $n$ vectors of dimension $p$ -- at most $\varepsilon n$ out of which are adversarially corrupted -- the SDR estimator has a squared error of order $\big(\frac{r_Σ}{n} + \varepsilon^2\log(1/\varepsilon)\big){\log p}$ and a running time of order $p^3 + n p^2$. Here, $r_Σ\le p$ is the effective rank of the covariance matrix of the reference distribution. Another advantage of the SDR estimator is that it does not require knowledge of the contamination rate and does not involve sample splitting. We also investigate extensions of the proposed algorithm and of the obtained results in the case of (partially) unknown covariance matrix.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes