Nearly minimax robust estimator of the mean vector by iterative spectral dimension reduction
This provides a robust estimator for statistical inference in high-dimensional data with adversarial corruptions, offering theoretical guarantees and computational efficiency.
The paper tackles robust estimation of the mean vector for sub-Gaussian distributions by introducing a spectral dimension reduction estimator, achieving a minimax-optimal error bound up to a logarithmic factor and a breakdown point of 1/2, with squared error order (r_Σ/n + ε²log(1/ε))log p and runtime order p³ + n p².
We study the problem of robust estimation of the mean vector of a sub-Gaussian distribution. We introduce an estimator based on spectral dimension reduction (SDR) and establish a finite sample upper bound on its error that is minimax-optimal up to a logarithmic factor. Furthermore, we prove that the breakdown point of the SDR estimator is equal to $1/2$, the highest possible value of the breakdown point. In addition, the SDR estimator is equivariant by similarity transforms and has low computational complexity. More precisely, in the case of $n$ vectors of dimension $p$ -- at most $\varepsilon n$ out of which are adversarially corrupted -- the SDR estimator has a squared error of order $\big(\frac{r_Σ}{n} + \varepsilon^2\log(1/\varepsilon)\big){\log p}$ and a running time of order $p^3 + n p^2$. Here, $r_Σ\le p$ is the effective rank of the covariance matrix of the reference distribution. Another advantage of the SDR estimator is that it does not require knowledge of the contamination rate and does not involve sample splitting. We also investigate extensions of the proposed algorithm and of the obtained results in the case of (partially) unknown covariance matrix.