Statistical Optimality and Computational Efficiency of Nyström Kernel PCA
This work addresses computational scalability for researchers and practitioners using kernel PCA, though it is incremental as it builds on existing approximation schemes.
The paper tackles the computational burden of kernel methods in large-scale settings by theoretically analyzing Nyström approximate kernel PCA, showing it matches the statistical accuracy of exact KPCA while being computationally efficient, and outperforms random feature approximations in statistical performance.
Kernel methods provide an elegant framework for developing nonlinear learning algorithms from simple linear methods. Though these methods have superior empirical performance in several real data applications, their usefulness is inhibited by the significant computational burden incurred in large sample situations. Various approximation schemes have been proposed in the literature to alleviate these computational issues, and the approximate kernel machines are shown to retain the empirical performance. However, the theoretical properties of these approximate kernel machines are less well understood. In this work, we theoretically study the trade-off between computational complexity and statistical accuracy in Nyström approximate kernel principal component analysis (KPCA), wherein we show that the Nyström approximate KPCA matches the statistical performance of (non-approximate) KPCA while remaining computationally beneficial. Additionally, we show that Nyström approximate KPCA outperforms the statistical behavior of another popular approximation scheme, the random feature approximation, when applied to KPCA.