LGJan 14, 2015

Classification with Low Rank and Missing Data

arXiv:1501.03273v141 citations
Originality Highly original
AI Analysis

This addresses the problem of handling missing data in classification tasks for machine learning practitioners, offering a provable solution that is incremental in improving efficiency over existing methods.

The paper tackles classification and regression with missing data by assuming the clean data lies in a low-rank subspace, and it provides an efficient agnostic algorithm that classifies as well as the best linear classifier combined with the optimal low-dimensional subspace, enabling performance comparable to classifiers with full data access.

We consider classification and regression tasks where we have missing data and assume that the (clean) data resides in a low rank subspace. Finding a hidden subspace is known to be computationally hard. Nevertheless, using a non-proper formulation we give an efficient agnostic algorithm that classifies as good as the best linear classifier coupled with the best low-dimensional subspace in which the data resides. A direct implication is that our algorithm can linearly (and non-linearly through kernels) classify provably as well as the best classifier that has access to the full data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes