Andrew Lee, Harlin Lee, Jose A. Perea et al.
Many real-world datasets live on high-dimensional Stiefel and Grassmannian manifolds, $V_k(\mathbb{R}^N)$ and $Gr(k, \mathbb{R}^N)$ respectively, and benefit from projection onto lower-dimensional Stiefel and Grassmannian manifolds. In this work, we propose an algorithm called \textit{Principal Stiefel Coordinates (PSC)} to reduce data dimensionality from $ V_k(\mathbb{R}^N)$ to $V_k(\mathbb{R}^n)$ in an \textit{$O(k)$-equivariant} manner ($k \leq n \ll N$). We begin by observing that each element $α\in V_n(\mathbb{R}^N)$ defines an isometric embedding of $V_k(\mathbb{R}^n)$ into $V_k(\mathbb{R}^N)$. Next, we describe two ways of finding a suitable embedding map $α$: one via an extension of principal component analysis ($α_{PCA}$), and one that further minimizes data fit error using gradient descent ($α_{GD}$). Then, we define a continuous and $O(k)$-equivariant map $π_α$ that acts as a "closest point operator" to project the data onto the image of $V_k(\mathbb{R}^n)$ in $V_k(\mathbb{R}^N)$ under the embedding determined by $α$, while minimizing distortion. Because this dimensionality reduction is $O(k)$-equivariant, these results extend to Grassmannian manifolds as well. Lastly, we show that $π_{α_{PCA}}$ globally minimizes projection error in a noiseless setting, while $π_{α_{GD}}$ achieves a meaningfully different and improved outcome when the data does not lie exactly on the image of a linearly embedded lower-dimensional Stiefel manifold as above. Multiple numerical experiments using synthetic and real-world data are performed.