ML LGMay 30, 2022

Support Recovery in Sparse PCA with Incomplete Data

arXiv:2205.15215v15.33 citationsh-index: 19

Originality Incremental advance

AI Analysis

This addresses a practical challenge in high-dimensional data analysis, such as genomics, by enabling accurate support recovery from incomplete observations, though it is incremental as it builds on existing SDP relaxation methods.

The paper tackles the problem of recovering the true support of the sparse leading eigenvector in sparse PCA from incomplete and noisy data, achieving exact recovery under conditions involving matrix incoherence, spectral gap, observation probability, and noise variance, with validation on synthetic and gene expression datasets.

We study a practical algorithm for sparse principal component analysis (PCA) of incomplete and noisy data. Our algorithm is based on the semidefinite program (SDP) relaxation of the non-convex $l_1$-regularized PCA problem. We provide theoretical and experimental evidence that SDP enables us to exactly recover the true support of the sparse leading eigenvector of the unknown true matrix, despite only observing an incomplete (missing uniformly at random) and noisy version of it. We derive sufficient conditions for exact recovery, which involve matrix incoherence, the spectral gap between the largest and second-largest eigenvalues, the observation probability and the noise variance. We validate our theoretical results with incomplete synthetic data, and show encouraging and meaningful results on a gene expression dataset.

View on arXiv PDF

Similar