On the Sample Complexity of Subspace Learning
This work addresses the sample complexity of subspace learning, a foundational issue in machine learning, but it appears incremental as it builds on existing operator theoretic approaches to refine error estimates.
The paper tackles the problem of estimating linear subspaces from samples, a common task in machine learning algorithms like PCA and spectral embedding, and derives novel learning error estimates under natural spectral assumptions on the data distribution, providing sharp error bounds for PCA and spectral support estimation.
A large number of algorithms in machine learning, from principal component analysis (PCA), and its non-linear (kernel) extensions, to more recent spectral embedding and support estimation methods, rely on estimating a linear subspace from samples. In this paper we introduce a general formulation of this problem and derive novel learning error estimates. Our results rely on natural assumptions on the spectral properties of the covariance operator associated to the data distribu- tion, and hold for a wide class of metrics between subspaces. As special cases, we discuss sharp error estimates for the reconstruction properties of PCA and spectral support estimation. Key to our analysis is an operator theoretic approach that has broad applicability to spectral learning methods.