ML STFeb 17, 2017

How close are the eigenvectors and eigenvalues of the sample and actual covariance matrices?

arXiv:1702.05443v11.0

Originality Highly original

AI Analysis

This work provides theoretical guarantees for principal component analysis in statistics and machine learning, enabling more reliable dimensionality reduction and data analysis with limited samples.

The paper addresses the sample complexity required for the eigenvectors and eigenvalues of the sample covariance matrix to approximate those of the actual covariance matrix, proving that the inner product between eigenvectors decreases proportionally to eigenvalue distance for distributions with finite second moment or supported in a centered Euclidean ball, leading to non-asymptotic concentration bounds and conditions for distinguishing principal components with a constant number of samples.

How many samples are sufficient to guarantee that the eigenvectors and eigenvalues of the sample covariance matrix are close to those of the actual covariance matrix? For a wide family of distributions, including distributions with finite second moment and distributions supported in a centered Euclidean ball, we prove that the inner product between eigenvectors of the sample and actual covariance matrices decreases proportionally to the respective eigenvalue distance. Our findings imply non-asymptotic concentration bounds for eigenvectors, eigenspaces, and eigenvalues. They also provide conditions for distinguishing principal components based on a constant number of samples.

View on arXiv PDF

Similar