MLLGOct 19, 2020

Robust High Dimensional Expectation Maximization Algorithm via Trimmed Hard Thresholding

arXiv:2010.09576v1
Originality Incremental advance
AI Analysis

This addresses robust estimation for high-dimensional sparse data with corruptions, offering a method with theoretical guarantees, but it is incremental as it builds on existing EM frameworks.

The paper tackles the problem of estimating latent variable models with corrupted samples in high-dimensional sparse settings by proposing a Trimmed (Gradient) Expectation Maximization algorithm, which converges geometrically to near-optimal statistical rates when the corruption fraction is bounded by ˜O(1/√n), as validated by numerical results on models like mixture of Gaussians.

In this paper, we study the problem of estimating latent variable models with arbitrarily corrupted samples in high dimensional space ({\em i.e.,} $d\gg n$) where the underlying parameter is assumed to be sparse. Specifically, we propose a method called Trimmed (Gradient) Expectation Maximization which adds a trimming gradients step and a hard thresholding step to the Expectation step (E-step) and the Maximization step (M-step), respectively. We show that under some mild assumptions and with an appropriate initialization, the algorithm is corruption-proofing and converges to the (near) optimal statistical rate geometrically when the fraction of the corrupted samples $ε$ is bounded by $ \tilde{O}(\frac{1}{\sqrt{n}})$. Moreover, we apply our general framework to three canonical models: mixture of Gaussians, mixture of regressions and linear regression with missing covariates. Our theory is supported by thorough numerical results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes