LG DS MLJun 6, 2023

One-sided Matrix Completion from Two Observations Per Row

Steven Cao, Percy Liang, Gregory Valiant

Berkeley

arXiv:2306.04049v13.82 citationsh-index: 102

Originality Incremental advance

AI Analysis

This addresses the challenge of recovering matrix structure from extremely sparse data, which is incremental as it focuses on a specific regime where full completion is impossible.

The paper tackles the problem of one-sided matrix completion, aiming to recover the right singular vectors of a low-rank matrix when only two observations per row are available, and shows that this is possible with at least Ω(r² d log d) rows, outperforming standard methods in synthetic and genome sequencing data.

Given only a few observed entries from a low-rank matrix $X$, matrix completion is the problem of imputing the missing entries, and it formalizes a wide range of real-world settings that involve estimating missing data. However, when there are too few observed entries to complete the matrix, what other aspects of the underlying matrix can be reliably recovered? We study one such problem setting, that of "one-sided" matrix completion, where our goal is to recover the right singular vectors of $X$, even in the regime where recovering the left singular vectors is impossible, which arises when there are more rows than columns and very few observations. We propose a natural algorithm that involves imputing the missing values of the matrix $X^TX$ and show that even with only two observations per row in $X$, we can provably recover $X^TX$ as long as we have at least $Ω(r^2 d \log d)$ rows, where $r$ is the rank and $d$ is the number of columns. We evaluate our algorithm on one-sided recovery of synthetic data and low-coverage genome sequencing. In these settings, our algorithm substantially outperforms standard matrix completion and a variety of direct factorization methods.

View on arXiv PDF

Similar