STLGPRMLJun 24, 2020

An $\ell_p$ theory of PCA and spectral clustering

arXiv:2006.14062v38 citations
Originality Highly original
AI Analysis

It addresses the lack of precise characterizations of PCA scores for low-dimensional embeddings, benefiting statistical and machine learning applications, but is incremental as it builds on existing PCA and spectral methods.

The paper develops an ℓ_p perturbation theory for a modified PCA to improve performance with heteroscedastic noise, enabling precise analysis of principal component scores and optimal guarantees for spectral clustering and community detection, achieving information thresholds for exact recovery in models like Gaussian mixtures and stochastic block models.

Principal Component Analysis (PCA) is a powerful tool in statistics and machine learning. While existing study of PCA focuses on the recovery of principal components and their associated eigenvalues, there are few precise characterizations of individual principal component scores that yield low-dimensional embedding of samples. That hinders the analysis of various spectral methods. In this paper, we first develop an $\ell_p$ perturbation theory for a hollowed version of PCA in Hilbert spaces which provably improves upon the vanilla PCA in the presence of heteroscedastic noises. Through a novel $\ell_p$ analysis of eigenvectors, we investigate entrywise behaviors of principal component score vectors and show that they can be approximated by linear functionals of the Gram matrix in $\ell_p$ norm, which includes $\ell_2$ and $\ell_\infty$ as special examples. For sub-Gaussian mixture models, the choice of $p$ giving optimal bounds depends on the signal-to-noise ratio, which further yields optimality guarantees for spectral clustering. For contextual community detection, the $\ell_p$ theory leads to a simple spectral algorithm that achieves the information threshold for exact recovery. These also provide optimal recovery results for Gaussian mixture and stochastic block models as special cases.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes