An Online Riemannian PCA for Stochastic Canonical Correlation Analysis
This provides a more efficient stochastic algorithm for CCA, which is incremental as it builds on existing Riemannian optimization techniques to address computational bottlenecks in a classical problem.
The paper tackles the problem of efficiently extracting multiple canonical components in canonical correlation analysis (CCA) by proposing an online Riemannian PCA algorithm (RSG+), which achieves O(d^2k) runtime per iteration for top k components with O(1/t) convergence rate, improving over existing methods with higher complexity or limited to single components.
We present an efficient stochastic algorithm (RSG+) for canonical correlation analysis (CCA) using a reparametrization of the projection matrices. We show how this reparametrization (into structured matrices), simple in hindsight, directly presents an opportunity to repurpose/adjust mature techniques for numerical optimization on Riemannian manifolds. Our developments nicely complement existing methods for this problem which either require $O(d^3)$ time complexity per iteration with $O(\frac{1}{\sqrt{t}})$ convergence rate (where $d$ is the dimensionality) or only extract the top $1$ component with $O(\frac{1}{t})$ convergence rate. In contrast, our algorithm offers a strict improvement for this classical problem: it achieves $O(d^2k)$ runtime complexity per iteration for extracting the top $k$ canonical components with $O(\frac{1}{t})$ convergence rate. While the paper primarily focuses on the formulation and technical analysis of its properties, our experiments show that the empirical behavior on common datasets is quite promising. We also explore a potential application in training fair models where the label of protected attribute is missing or otherwise unavailable.