LGITAug 15, 2016

Correlated-PCA: Principal Components' Analysis when Data and Noise are Correlated

arXiv:1608.04320v2
Originality Incremental advance
AI Analysis

This addresses a gap in PCA theory for scenarios with data-dependent noise, which is incremental as it extends existing methods rather than introducing a new paradigm.

The paper tackles the PCA problem when data and noise are correlated, a scenario not covered by existing theoretical guarantees, and shows that standard eigenvalue decomposition remains correct under simple assumptions while proposing a generalized method that improves performance in certain regimes.

Given a matrix of observed data, Principal Components Analysis (PCA) computes a small number of orthogonal directions that contain most of its variability. Provably accurate solutions for PCA have been in use for a long time. However, to the best of our knowledge, all existing theoretical guarantees for it assume that the data and the corrupting noise are mutually independent, or at least uncorrelated. This is valid in practice often, but not always. In this paper, we study the PCA problem in the setting where the data and noise can be correlated. Such noise is often also referred to as "data-dependent noise". We obtain a correctness result for the standard eigenvalue decomposition (EVD) based solution to PCA under simple assumptions on the data-noise correlation. We also develop and analyze a generalization of EVD, cluster-EVD, that improves upon EVD in certain regimes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes