A Randomized Algorithm for Sparse PCA based on the Basic SDP Relaxation
This work addresses the computational challenge of SPCA for dimensionality reduction in high-dimensional data analysis, representing an incremental improvement with specific theoretical guarantees.
The paper tackles the NP-hard problem of Sparse Principal Component Analysis (SPCA) by introducing a randomized approximation algorithm based on a basic SDP relaxation, achieving an approximation ratio bounded by the sparsity constant with high probability and up to O(log d) under certain assumptions, as validated on real-world datasets.
Sparse Principal Component Analysis (SPCA) is a fundamental technique for dimensionality reduction, and is NP-hard. In this paper, we introduce a randomized approximation algorithm for SPCA, which is based on the basic SDP relaxation. Our algorithm has an approximation ratio of at most the sparsity constant with high probability, if called enough times. Under a technical assumption, which is consistently satisfied in our numerical tests, the average approximation ratio is also bounded by $\mathcal{O}(\log{d})$, where $d$ is the number of features. We show that this technical assumption is satisfied if the SDP solution is low-rank, or has exponentially decaying eigenvalues. We then present a broad class of instances for which this technical assumption holds. We also demonstrate that in a covariance model, which generalizes the spiked Wishart model, our proposed algorithm achieves a near-optimal approximation ratio. We demonstrate the efficacy of our algorithm through numerical results on real-world datasets.