ITLGMLOct 26, 2017

Optimal Shrinkage of Singular Values Under Random Data Contamination

arXiv:1710.09787v25 citations
AI Analysis

This work addresses a key challenge in machine learning, computer vision, and data science for handling contaminated data, though it is incremental as it builds on existing frameworks for noise models.

The paper tackles the problem of reconstructing a low-rank matrix from data contaminated by various types of noise, missing values, outliers, and corrupt entries, developing an asymptotically optimal algorithm that estimates the matrix by manipulating singular values and identifying a signal-to-noise cutoff below which reconstruction fails.

A low rank matrix X has been contaminated by uniformly distributed noise, missing values, outliers and corrupt entries. Reconstruction of X from the singular values and singular vectors of the contaminated matrix Y is a key problem in machine learning, computer vision and data science. In this paper we show that common contamination models (including arbitrary combinations of uniform noise,missing values, outliers and corrupt entries) can be described efficiently using a single framework. We develop an asymptotically optimal algorithm that estimates X by manipulation of the singular values of Y , which applies to any of the contamination models considered. Finally, we find an explicit signal-to-noise cutoff, below which estimation of X from the singular value decomposition of Y must fail, in a well-defined sense.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes