Heteroskedastic PCA: Algorithm, Optimality, and Applications
This addresses a fundamental issue in high-dimensional statistics for researchers and practitioners dealing with noisy data, offering a novel solution with broad applications, though it builds on existing PCA methods.
The paper tackles the problem of principal component analysis (PCA) in the presence of heteroskedastic noise by introducing a framework and algorithm called HeteroPCA, which iteratively imputes diagonal entries of the covariance matrix to remove bias, and proves it is computationally efficient and provably optimal under a generalized spiked covariance model.
A general framework for principal component analysis (PCA) in the presence of heteroskedastic noise is introduced. We propose an algorithm called HeteroPCA, which involves iteratively imputing the diagonal entries of the sample covariance matrix to remove estimation bias due to heteroskedasticity. This procedure is computationally efficient and provably optimal under the generalized spiked covariance model. A key technical step is a deterministic robust perturbation analysis on singular subspaces, which can be of independent interest. The effectiveness of the proposed algorithm is demonstrated in a suite of problems in high-dimensional statistics, including singular value decomposition (SVD) under heteroskedastic noise, Poisson PCA, and SVD for heteroskedastic and incomplete data.