STITLGMEMLJul 26, 2021

Inference for Heteroskedastic PCA with Missing Data

arXiv:2107.12365v232 citations
Originality Incremental advance
AI Analysis

This addresses the under-explored challenge of uncertainty quantification in PCA for high-dimensional data with missing entries and heteroskedastic noise, which is incremental as it builds on an existing estimator (HeteroPCA) to enable inference.

The paper tackles the problem of constructing confidence regions for principal component analysis (PCA) in high-dimensional settings with missing data and heteroskedastic noise, proposing a novel approach based on HeteroPCA that provides non-asymptotic distributional guarantees and enables computation of confidence regions for the principal subspace and entrywise confidence intervals for the spiked covariance matrix.

This paper studies how to construct confidence regions for principal component analysis (PCA) in high dimension, a problem that has been vastly under-explored. While computing measures of uncertainty for nonlinear/nonconvex estimators is in general difficult in high dimension, the challenge is further compounded by the prevalent presence of missing data and heteroskedastic noise. We propose a novel approach to performing valid inference on the principal subspace under a spiked covariance model with missing data, on the basis of an estimator called HeteroPCA (Zhang et al., 2022). We develop non-asymptotic distributional guarantees for HeteroPCA, and demonstrate how these can be invoked to compute both confidence regions for the principal subspace and entrywise confidence intervals for the spiked covariance matrix. Our inference procedures are fully data-driven and adaptive to heteroskedastic random noise, without requiring prior knowledge about the noise levels.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes