LGJan 14, 2015

Classification with Low Rank and Missing Data

arXiv:1501.03273v112.141 citations

Originality Highly original

AI Analysis

This addresses the problem of handling missing data in classification tasks for machine learning practitioners, offering a provable solution that is incremental in improving efficiency over existing methods.

The paper tackles classification and regression with missing data by assuming the clean data lies in a low-rank subspace, and it provides an efficient agnostic algorithm that classifies as well as the best linear classifier combined with the optimal low-dimensional subspace, enabling performance comparable to classifiers with full data access.

We consider classification and regression tasks where we have missing data and assume that the (clean) data resides in a low rank subspace. Finding a hidden subspace is known to be computationally hard. Nevertheless, using a non-proper formulation we give an efficient agnostic algorithm that classifies as good as the best linear classifier coupled with the best low-dimensional subspace in which the data resides. A direct implication is that our algorithm can linearly (and non-linearly through kernels) classify provably as well as the best classifier that has access to the full data.

View on arXiv PDF

Similar