Speeding up PCA with priming
This work addresses computational efficiency in PCA for large-scale data analysis, offering a method that is incremental but provides systematic performance gains.
The paper tackles the problem of speeding up principal component analysis (PCA) by introducing primed-PCA (pPCA), a two-step algorithm that first uses an approximate-PCA method for priming and then applies exact PCA in a small subspace, resulting in improved accuracy for a given computational budget, with experiments showing average speedups of 7.2x over Oja's rule and 10.5x over EigenGame.
We introduce primed-PCA (pPCA), a two-step algorithm for speeding up the approximation of principal components. This algorithm first runs any approximate-PCA method to get an initial estimate of the principal components (priming), and then applies an exact PCA in the subspace they span. Since this subspace is of small dimension in any practical use, the second step is extremely cheap computationally. Nonetheless, it improves accuracy significantly for a given computational budget across datasets. In this setup, the purpose of the priming is to narrow down the search space, and prepare the data for the second step, an exact calculation. We show formally that pPCA improves upon the priming algorithm under very mild conditions, and we provide experimental validation on both synthetic and real large-scale datasets showing that it systematically translates to improved performance. In our experiments we prime pPCA by several approximate algorithms and report an average speedup by a factor of 7.2 over Oja's rule, and a factor of 10.5 over EigenGame.